The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This R package provides a big-data-friendly and memory-efficient difference-in-differences (DiD) estimator for staggered (and non-staggered) treatment contexts. It supports controlling for time-varying covariates, heteroskedasticity-robust standard errors, and (single and multi-way) clustered standard errors. It addresses 4 issues that arise in the context of large administrative datasets:
DiDforBigData will provide estimation and inference for
staggered DiD with millions of observations on a personal laptop. It is
orders of magnitude faster than other available software if the sample
size is large; see the demonstration here.DiDforBigData helps by using much less memory than other
software; see the demonstration here.data.table for big data management and
sandwich for robust standard error estimation, which are
already installed with most R distributions. Optionally, it will use the
fixest package to speed up the estimation if it is
installed. If the progress package is installed, it will
also provide a progress bar so you know how much longer the estimation
will take.DiDforBigData makes
parallelization easy as long as the parallel package is
installed.To install the package from CRAN:
install.packages("DiDforBigData")To install the package from Github:
devtools::install_github("setzler/DiDforBigData")To use the package after it is installed:
library(DiDforBigData)It is recommended to also make sure these optional packages have been installed:
library("progress")
library("fixest")
library("parallel")There are only 3 functions in this package:
SimDiD(): This function simulates data.DiDge(): This function estimates DiD for a single
cohort and a single event time.DiD(): This function estimates DiD for all available
cohorts and event times.Details for each function are available from the Function Documentation.
Before estimation, set up a variable list with the names of your variables:
varnames = list()
varnames$time_name = "year"
varnames$outcome_name = "Y"
varnames$cohort_name = "cohort"
varnames$id_name = "id"To estimate DiD for a single cohort and event time, use the
DiDge command. For example:
DiDge(inputdata = yourdata, varnames = varnames,
cohort_time = 2010, event_postperiod = 3)A detailed manual explaining the various features available in
DiDge is available here
or by running this command in R:
?DiDgeTo estimate DiD for many cohorts and event times, use the
DiD command. For example:
DiD(inputdata = yourdata, varnames = varnames,
min_event = -3, max_event = 5)A detailed manual explaining the various features available in
DiD is available here
or by running this command in R:
?DiDFor more information, read the following articles:
Acknowledgements: Thanks to Mert Demirer and Kirill Borusyak for helpful comments.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.