1. Difference-in-differences Estimation
Difference in differences (DID
) is an econometric technique that attempt to identify the impact of a
policy intervention in the treatment group as compared to the control
group. In contrast to the Randomized Controlled Experiment, it uses
observational data to identify such differential impact of treatment
The
main idea of DID is to compare the effect on an outcome variable by
comparing the average change over time in the outcome variable for
the treatment group, compared to the average change over time for the
control group.
The concept of DID can be made clear with the following example:
Consider
a policy intervention of providing skill training. We are interested
whether wage rate rises or not as a result of this intervention or not.
In
the graph above, the line HC shows the growth of wages for the control
group where policy intervention is absent. The line GA shows the growth
of wages for the treatment group with skill training. In year 1, the
total difference between the wages of the two groups is not the true
impact of the policy intervention because there was per-existing
difference in wages. Thus, the true impact is:
True Impact (AB)=AC-BC(=GH)
One
crucial assumption taken here is the parallel trend assumption. That
is, if no policy intervention was brought, the growth in wages for both
groups would follow a similar trend.
DID Estimation in STATA
To illustrate the DID estimation in stata, consider the hypothetical data where :
- year and treat are dummy variables
- year=0 if pre-intervention, 1 =post intervention
- treat=0 for control group and 1=treatment group
- yeartreated=year*treated is an interactive dummy variable
- wage =per hour wage in dollar.
The excel data file and the stata do file are available here.
| year | treat | yeartreated | wage |
| 0 | 1 | 0 | 12 |
| 0 | 0 | 0 | 4 |
| 0 | 1 | 0 | 22 |
| 0 | 0 | 0 | 3 |
| 0 | 1 | 0 | 12 |
| 0 | 0 | 0 | 3 |
| 0 | 1 | 0 | 24 |
| 0 | 0 | 0 | 15 |
| 0 | 1 | 0 | 12 |
| 0 | 0 | 0 | 8 |
| 0 | 1 | 0 | 12 |
| 0 | 0 | 0 | 4 |
| 0 | 1 | 0 | 15 |
| 0 | 0 | 0 | 6 |
| 0 | 1 | 0 | 11 |
| 0 | 0 | 0 | 6 |
| 0 | 1 | 0 | 12 |
| 0 | 0 | 0 | 15 |
| 0 | 1 | 0 | 2 |
| 1 | 0 | 0 | 12 |
| 1 | 1 | 1 | 21 |
| 1 | 0 | 0 | 12 |
| 1 | 1 | 1 | 23 |
| 1 | 0 | 0 | 16 |
| 1 | 1 | 1 | 33 |
| 1 | 0 | 0 | 12 |
| 1 | 1 | 1 | 24 |
| 1 | 0 | 0 | 15 |
| 1 | 1 | 1 | 29 |
| 1 | 0 | 0 | 21 |
| 1 | 1 | 1 | 32 |
| 1 | 0 | 0 | 14 |
| 1 | 1 | 1 | 23 |
| 1 | 0 | 0 | 13 |
| 1 | 1 | 1 | 53 |
| 1 | 0 | 0 | 16 |
| 1 | 1 | 1 | 24 |
| 1 | 0 | 0 | 25 |
| 1 | 1 | 1 | 42 |
Import the data into stata.
Method I : Rgeression Method
We run the regression :
wage =α+β*year+γ*treated+δ*yeartreated+u
Here :
Expected wage for control group in year 1 is : α+β
Expected wage for treatment group in year 1 is : α+β+δ
DID= α+β+δ-(α+β)=δ
Thus, the coefficient δ provides us the difference in difference estimator.
To implement this regression in stata, type the command :
reg wage year treat yeartreated, robust
The following result will appear.
Here, the DID estimator is 8.51 which is statistically significant at 10 percent level of significance.
The DID estimator through regression can be calculated through the hastag command as :
*hastag method
reg wage year##treat, r
reg wage year##treat, r
The following output will appear in stata:
The
results in this table are essentially the same except the fact that
here it is not necessary to define the interactive dummy variable to run
the regression. Such dummy variable is automatically defined by stata
during the estimation process.
Method II : Installing Diff program file
The user written command file diff can be installed to estimate the DID. For this, the stata commands are :
ssc install diff The stata says that
checking diff consistency and verifying not already installed...
all files already exist and are up to date.Then type
all files already exist and are up to date.Then type
diff wage, t(treat) p(year)The following results will appear :
The results match with the previous method.
Method II : t-test Method
The DID can be found by using grouped t-test in year 0 and year 1 as below :
ttest wage if year==0, by (treat)
The output is :
ttest wage if year==1, by (treat)
The output is :
The DID estimator is the difference between the differences : -6.288-(-14.8)=8.512
Method III : Collapse command
In stata, type the following command collapse (mean) wage, by(year treat)This command collapses the data set into four categories of mean value of wages according to the values to be taken by the dummies.
Type the browse command and the result in our example will appear as :
year treat wage
0 0 7.11
0 1 13.4
1 0 15.6
1 1 30.4
In year zero, the pre-existing difference in mean wages between treatment and control group is : 13.4-7.11=6.29
0 0 7.11
0 1 13.4
1 0 15.6
1 1 30.4
In year zero, the pre-existing difference in mean wages between treatment and control group is : 13.4-7.11=6.29
In year 1, the difference in mean wages between the groups is : 30.4-15.6=14.8
Difference in differences =14.8-6.29=8.51
Among
all these methods, the first and second methods are preferred as they
provide the standard error of the differences that us helpful for
deciding whether the the DID is statistically significant or not.
For comments, please contact srb863@g.harvard.edu
For comments, please contact srb863@g.harvard.edu
.....................................................................................................................................................................
2.Statistical and Economic Significance
Statistical
Significance and economic significance are the two terms used quite
often while interpreting the regression results.
Statistical
significance is a pure statistical concept that refers to whether we
can conclude that there is no relationship between the variables at all
and the true population coefficient is zero. Thus, it is to do about the
hypothesis testing of the regression coefficient.
As
a rule of thumb, if the standard error of the estimate of the
regression coefficient is less than half the value of the estimated
coefficient, the variable is said to be statistically significant in
the relationship.
Different language to measure the statistical significance of the regression coefficient are :
A variable included in the regression is statistically significant if
- The standard error of the regression coefficient is less than half of the value of the coefficient or
- The statistic associated with the coefficient is greater than 1.96 (we have considered 5 % level of significance) or
- The probability value associated with the coefficient is less than 0.05 (again 5 % level of significance is assumed)
- The confidence interval of the coefficient does not include zero in its range.
An Example : Consider the regression output given below :
The variable treat is statistically significant because:
- Method I : Standard error (2.26)<1/2 of coefficient(10.66)
- Method II : t-ratio is greater than 1.96.
- Method III : The probability value is less than 0.05.
- Method IV: The 95 percent confidence interval does not include zero.
On
the other hand, economic significance is to do with whether the
coefficient matters for policy or not. For example, consider the
following hypothetical output from a regression :
Crime =10-0.000001*educ+0.0000236*drinking
Where,
Crime =number of crimes committed in ith society
Educ=average year of education of the ith society
drinking =average level of alcohol drinking in the ith society
Suppose that the coefficient are statistically significant?
Do they matter for policy ??
Do you recommend providing more education for controlling crime ??
Even
if we change the variable from its mean value to its first quartile or
third quartile (a large shift from policy perspective and expenditure to
be incurred), the reduction in crime is insignificant. Thus, the
coefficients are not economically significant.
...................................................................................................................................................................
...................................................................................................................................................................







No comments:
Post a Comment