Difference-in-Differences (DiD) has become one of the most popular research designs used to evaluate causal effects of policy interventions. In its canonical format, there are two time periods and two groups:
In the first period no one is treated, and in the second period some units are treated (the treated group), and some units are not (the comparison group).
If, in the absence of treatment, the average outcomes for treated and comparison groups would have followed parallel paths over time (which is the so-called parallel trends assumption), one can estimate the average treatment effect for the treated subpopulation (ATT) by comparing the average change in outcomes experienced by the treated group to the average change in outcomes experienced by the comparison group.
Methodological extensions of DiD methods often focus on this standard two periods, two groups setup; see, e.g.,
Heckman et al. (1997, 1998),
Abadie (2005),
Athey and Imbens (2006),
Qin and Zhang (2008),
Bonhomme and Sauder (2011),
de Chaisemartin and D’Haultfœuille (2017),
Botosaru and Gutierrez (2018),
Callaway et al. (2018),
Sant’Anna and Zhao (2020).
Many DiD empirical applications, however, deviate from the canonical DiD setup and have more than two time periods and variation in treatment timing.
In this article, we provide a unified framework for:
Average treatment effects in DiD setups with multiple time periods,
Variation in treatment timing,
When the parallel trends assumption holds potentially only after conditioning on observed covariates.
We concentrate our attention on DiD with staggered adoption, i.e., to DiD setups such that once units are treated, they remain treated in the following periods.
The core of our proposal relies on separating the DiD analysis into three separate steps:
Identification of policy-relevant disaggregated causal parameters;
Aggregation of these parameters to form summary measures of the causal effects;
Estimation and inference about these different target parameters.
Our approach allows for estimation and inference on interpretable causal parameters allowing for arbitrary treatment effect heterogeneity and dynamic effects:
Thereby completely avoiding the issues of interpreting results of standard two-way fixed effects (TWFE) regressions as causal effects in DiD setups as pointed out by:
Borusyak and Jaravel (2017),
de Chaisemartin and D’Haultfœuille (2020),
Goodman-Bacon (2019), Sun and Abraham (2020),
Athey and Imbens (2018).
In addition, it adds transparency and objectivity to the analysis (Rubin, 2007, 2008), and allows researchers to exploit a variety of estimation methods to answer different questions of interest.
The identification step of the analysis provides a blueprint for the other steps. In this paper, we pay particular attention to the disaggregated causal parameter that we call the group-time average treatment effect, i.e., the average treatment effect for group g at time t, where a “group” is defined by the time period when units are first treated. In the canonical DiD setup with two periods and two groups, these parameters reduce to the ATT which is typically the parameter of interest in that setup. An attractive feature of the group-time average treatment effect parameters is that they do not directly restrict heterogeneity with respect to observed covariates, the period in which units are first treated, or the evolution of treatment effects over time. As a consequence, these easy-to-interpret causal parameters can be directly used for learning about treatment effect heterogeneity, and/or to construct many other more aggregated causal parameters. We view this level of generality and flexibility as one of the main advantages of our proposal. We provide sufficient conditions related to treatment anticipation behavior and conditional parallel trends under which these group-time average treatment effects are nonparametrically point-identified. A unique feature of our framework is that it shows how researchers can flexibly incorporate covariates into the staggered DiD setup with multiple groups and multiple periods. This is particularly important in applications in which differences in observed characteristics create non-parallel outcome dynamics between different groups – in this case, unconditional DiD strategies are generally not appropriate to recover sensible causal parameters of interest (Heckman et al., 1997, 1998; Abadie, 2005). We propose three different types of DiD estimands in staggered treatment adoption setups: one based on outcome regressions (Heckman et al., 1997, 1998), one based on inverse probability weighting (Abadie, 2005), and one based on doubly-robust methods (Sant’Anna and Zhao, 2020). We provide versions of these estimands both for the case with panel data and for the case with repeated cross sections data. To the best of our knowledge, this paper is the first to show how one can allow for covariate-specific trends across groups in DiD setups with variation in treatment timing. Our results also highlight that, in practice, one can rely on different types of parallel trends assumptions and allow some types of treatment anticipation behavior; our proposed estimands explicitly reflect these assumptions.