Title: | Triple-Difference Estimators |
Version: | 0.1.0 |
Description: | Implements triple-difference (DDD) estimators for both average treatment effects and event-study parameters. Methods include regression adjustment, inverse-probability weighting, and doubly-robust estimators, all of which rely on a conditional DDD parallel-trends assumption and allow covariate adjustment across multiple pre- and post-treatment periods. The methodology is detailed in Ortiz-Villavicencio and Sant'Anna (2025) <doi:10.48550/arXiv.2505.09942>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Depends: | R (≥ 4.3) |
RoxygenNote: | 7.3.2 |
Imports: | BMisc (≥ 1.4.6), data.table (≥ 1.15.0), Matrix (≥ 1.6.1), parallel (≥ 1.4.0), parglm (≥ 0.1.7), Rcpp (≥ 1.0.12), speedglm (≥ 0.3-5) |
LinkingTo: | Rcpp (≥ 1.0.12) |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
URL: | http://marcelortiz.com/triplediff/ |
NeedsCompilation: | yes |
Packaged: | 2025-07-22 16:51:53 UTC; marcelortizv |
Author: | Marcelo Ortiz-Villavicencio [aut, cre], Pedro H. C. Sant'Anna [aut] |
Maintainer: | Marcelo Ortiz-Villavicencio <marcelo.ortiz@emory.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-23 19:10:02 UTC |
Aggregate Group-Time Average Treatment Effects in Staggered Triple-Differences Designs.
Description
agg_ddd
is a function that take group-time average treatment effects
and aggregate them into a smaller number of summary parameters in staggered triple differences designs.
There are several possible aggregations including "simple"
, "eventstudy"
, "group"
,
and "calendar"
. Default is "eventstudy"
.
Usage
agg_ddd(
ddd_obj,
type = "eventstudy",
balance_e = NULL,
min_e = -Inf,
max_e = Inf,
na.rm = FALSE,
boot = NULL,
nboot = NULL,
cband = NULL,
alpha = 0.05
)
Arguments
ddd_obj |
a |
type |
Which type of aggregated treatment effect parameter to compute.
|
balance_e |
If set (and if one computes event study), it balances
the sample with respect to event time. For example, if |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
na.rm |
Logical value if we are to remove missing Values from analyses. Defaults is FALSE. |
boot |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
nboot |
The number of bootstrap iterations to use. The default is the value set in the ddd object,
and this is only applicable if |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
alpha |
The level of confidence for the confidence intervals. The default is 0.05. Otherwise, it will use the value set in the ddd object. |
Value
A object (list) of class agg_ddd
that holds the results from the
aggregation step.
Examples
#----------------------------------------------------------
# Triple Diff with multiple time periods
#----------------------------------------------------------
data <- gen_dgp_mult_periods(size = 500, dgp_type = 1)[["data"]]
out <- ddd(yname = "y", tname = "time", idname = "id",
gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4,
data = data, control_group = "nevertreated", base_period = "varying",
est_method = "dr")
# Simple aggregation
agg_ddd(out, type = "simple", alpha = 0.10)
# Event study aggregation
agg_ddd(out, type = "eventstudy", alpha = 0.10)
# Group aggregation
agg_ddd(out, type = "group", alpha = 0.10)
# Calendar aggregation
agg_ddd(out, type = "calendar", alpha = 0.10)
Doubly robust DDD estimator for ATT, with panel data and 2 periods
Description
This function implements a doubly robust estimator for assessing the average treatment effect on the treated (ATT) using a triple differences (DDD) approach in panel data settings across two time periods. The function takes preprocessed data structured specifically for this analysis.
Usage
att_dr(did_preprocessed)
Arguments
did_preprocessed |
A list containing preprocessed data and specifications for the DDD estimation.
Expected elements include:
- |
Value
A list with the estimated ATT, standard error, upper and lower confidence intervals, and influence function.
Compute Aggregated Treatment Effect Parameters
Description
Does the heavy lifting on computing aggregated group-time average treatment effects
Usage
compute_aggregation(
ddd_obj,
type = "simple",
cluster = NULL,
balance_e = NULL,
min_e = -Inf,
max_e = Inf,
na.rm = FALSE,
boot = FALSE,
nboot = NULL,
cband = NULL,
alpha = 0.05
)
Arguments
ddd_obj |
a ddd object (i.e., the results of the |
type |
Which type of aggregated treatment effect parameter to compute.
|
cluster |
The name of the variable to be used for clustering. The maximum number of cluster variables is 1. Default is |
balance_e |
If set (and if one computes event study), it balances
the sample with respect to event time. For example, if |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
na.rm |
Logical value if we are to remove missing Values from analyses. Defaults is FALSE. |
boot |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
nboot |
The number of bootstrap iterations to use. The default is the value set in the ddd object,
and this is only applicable if |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
alpha |
The level of confidence for the confidence intervals. The default is 0.05. Otherwise, it will use the value set in the ddd object. |
Value
Aggregation object (list) of class agg_ddd
Take influence function and compute standard errors
Description
Function to take an nx1 influence function and return a standard error
Usage
compute_se_agg(influence_function, boot = FALSE, boot_std_errors = NA)
Arguments
influence_function |
An influence function |
boot |
a boolean indicating whether bootstrapping was performed |
boot_std_errors |
a vector of bootstrapped standard errors |
Value
scalar standard error
Doubly Robust DDD estimators for the group-time average treatment effects.
Description
ddd
is the main function for computing the Doubly Robust DDD estimators for the ATT, with balanced panel data.
It can be used with covariates and/or under multiple time periods. At its core, triplediff
employs
the doubly robust estimator for the ATT, which is a combination of the propensity score weighting and the outcome regression.
Furthermore, this package supports the application of machine learning methods for the estimation of the nuisance parameters.
Usage
ddd(
yname,
tname,
idname,
gname,
pname,
xformla,
data,
control_group = NULL,
base_period = NULL,
est_method = "dr",
weightsname = NULL,
boot = FALSE,
nboot = NULL,
cluster = NULL,
cband = FALSE,
alpha = 0.05,
use_parallel = FALSE,
cores = 1,
inffunc = FALSE,
skip_data_checks = FALSE
)
Arguments
yname |
The name of the outcome variable. |
tname |
The name of the column containing the time periods. |
idname |
The name of the column containing the unit id. |
gname |
The name of the column containing the first period when a particular observation is treated. It is a positive number for treated units and defines which group the unit belongs to. It takes value 0 or Inf for untreated units. |
pname |
The name of the column containing the partition variable (e.g., the subgroup identifier). This is an indicator variable that is 1 for the units eligible for treatment and 0 otherwise. |
xformla |
The formula for the covariates to be included in the model. It should be of the form |
data |
A data frame or data table containing the data. |
control_group |
Valid for multiple periods only. The control group to be used in the estimation. Default is |
base_period |
Valid for multiple periods. Choose between a "varying" or "universal" base period. Both yield the same post-treatment ATT(g,t) estimates. Varying base period: Computes pseudo-ATT in pre-treatment periods by comparing outcome changes for a group to its comparison group from t-1 to t, repeatedly changing t. Universal base period: Fixes the base period to (g-1), reporting average changes from t to (g-1) for a group relative to its comparison group, similar to event study regressions. Varying base period reports ATT(g,t) right before treatment. Universal base period normalizes the estimate before treatment to be 0, adding one extra estimate in an earlier period. |
est_method |
The estimation method to be used. Default is |
weightsname |
The name of the column containing the weights. Default is |
boot |
Logical. If |
nboot |
The number of bootstrap samples to be used. Default is |
cluster |
The name of the variable to be used for clustering. The maximum number of cluster variables is 1. Default is |
cband |
Logical. If |
alpha |
The level of significance for the confidence intervals. Default is |
use_parallel |
Logical. If |
cores |
The number of cores to be used in the parallel processing. Default is |
inffunc |
Logical. If |
skip_data_checks |
Logical. If |
Value
A ddd
object with the following basic elements:
ATT |
The average treatment effect on the treated. |
se |
The standard error of the ATT. |
uci |
The upper confidence interval of the ATT. |
lci |
The lower confidence interval of the ATT. |
inf_func |
The estimate of the influence function. |
Examples
#----------------------------------------------------------
# Triple Diff with covariates and 2 time periods
#----------------------------------------------------------
set.seed(1234) # Set seed for reproducibility
# Simulate data for a two-periods DDD setup
df <- gen_dgp_2periods(size = 5000, dgp_type = 1)$data
head(df)
att_22 <- ddd(yname = "y", tname = "time", idname = "id", gname = "state",
pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4,
data = df, control_group = "nevertreated", est_method = "dr")
summary(att_22)
#----------------------------------------------------------
# Triple Diff with multiple time periods
#----------------------------------------------------------
data <- gen_dgp_mult_periods(size = 1000, dgp_type = 1)[["data"]]
ddd(yname = "y", tname = "time", idname = "id",
gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4,
data = data, control_group = "nevertreated", base_period = "varying",
est_method = "dr")
Function that generates panel data with single treatment date assignment and two time periods.
Description
Generate panel data with a single treatment date and two periods
Usage
gen_dgp_2periods(size, dgp_type)
Arguments
size |
Integer. Number of units. |
dgp_type |
Integer in {1,2,3,4}. 1 = both nuisance functions correct; 2 = only the outcome model correct; 3 = only the propensity score correct; 4 = both nuisance functions incorrect. |
Value
A list with the following elements:
- data
A
data.table
in long format with columns:-
id
: unit identifier -
state
: state variable -
time
: time variable -
partition
: partition assignment -
x1
,x2
,x3
,x4
: covariates -
y
: outcome variable -
cluster
: cluster ID (no within-cluster correlation)
-
- att
True average treatment effect on the treated (ATT), set to 0.
- att.unf
Oracle ATT computed under the unfeasible specification.
- eff
Theoretical efficiency bound for the estimator.
Generate panel data with staggered treatment adoption (three periods)
Description
Generate panel data where units adopt treatment at different times across three periods.
Usage
gen_dgp_mult_periods(size, dgp_type = 1)
Arguments
size |
Integer. Number of units to simulate. |
dgp_type |
Integer in {1,2,3,4}. 1 = both nuisance functions correct; 2 = only the outcome model correct; 3 = only the propensity-score model correct; 4 = both nuisance functions misspecified. |
Value
A named list with components:
- data
A
data.table
in long format with columns:-
id
: unit identifier -
cohort
: first period when treatment is assigned -
partition
: partition indicator -
x1
,x2
,x3
,x4
: covariates -
cluster
: cluster identifier (no within-cluster correlation) -
time
: time period index -
y
: observed outcome
-
- data_wide
A
data.table
in wide format (one row perid
) with columns:-
id
,cohort
,partition
,x1
,x2
,x3
,x4
,cluster
-
y_t0
,y_t1
,y_t2
: outcomes in periods 0, 1, and 2
-
- ES_0_unf
Unfeasible (oracle) event-study parameter at time 0.
- prob_g2_p1
Proportion of units with
cohort == 2
and eligibility in period 1.- prob_g3_p1
Proportion of units with
cohort == 3
and eligibility in period 1.
Function to generate a fake dataset for testing purposes only.
Description
Function to generate fake dataset to test internal procedures.
Usage
generate_test_panel(
seed = 123,
num_ids = 100,
time = 2,
initial.year = 2019,
treatment.year = 2020
)
Arguments
seed |
Seed for reproducibility |
num_ids |
Number of IDs |
time |
Number of time periods |
initial.year |
Initial year |
treatment.year |
Treatment year |
Value
A data.table with the following columns:
id: ID
state: State variable
year: Time variable
partition: Partition variable
x1: Covariate 1
x2: Covariate 2
treat: Treatment variable
outcome: Outcome variable
Get an influence function for particular aggregate parameters
Description
Get an influence function for particular aggregate parameters
This is a generic internal function for combining influence functions across ATT(g,t)'s to return an influence function for various aggregated treatment effect parameters.
Usage
get_agg_inf_func(att, inf_func, whichones, weights_agg, wif = NULL)
Arguments
att |
vector of group-time average treatment effects |
inf_func |
influence function for all group-time average treatment effects (matrix) |
whichones |
which elements of att will be used to compute the aggregated treatment effect parameter |
weights_agg |
the weights to apply to each element of att(whichones); should have the same dimension as att(whichones) |
wif |
extra influence function term coming from estimating the weights; should be n x k matrix where k is dimension of whichones |
Value
nx1 influence function
Multiplier Bootstrap
Description
This function take an influence function and use the multiplier bootstrap to compute standard errors and critical values for uniform confidence bands.
Usage
mboot(inf_func, did_preprocessed, use_parallel = FALSE, cores = 1)
Arguments
inf_func |
an influence function |
did_preprocessed |
A |
use_parallel |
Boolean of whether or not to use parallel processing in the multiplier
bootstrap, default is |
cores |
the number of cores to use with parallel processing, default is |
Value
A list with following elements
bres |
results from each bootstrap iteration. |
V |
variance matrix. |
se |
standard errors. |
crit_val |
a critical value for computing uniform confidence bands. |
Process results inside att_gt_dr and att_gt_dml function
Description
Process results inside att_gt_dr and att_gt_dml function
Usage
process_attgt(attgt_list)
Arguments
attgt_list |
A list of results from the |
Value
A list with three vectors: group
, att
, and periods
containing the group, average treatment effect, and time periods respectively.