Starting from:

$30

Economics 7103 Homework 2 -Solved

You have access to imaginary data on an energy-efficiency retrofit program in Atlanta kwh.csv and you are 
interested in whether the program reduced energy use. In your dataset is the following information: 
Variable 
Description 
electricity 
kWh of electricity used by the household in the month 
sqft 
Square feet of the home 
retrofit 
= 1 if the home received a retrofit 
temp 
The outdoor average temperature (◦F) during the month at the home’s location 
Table 1: Variable descriptions for homework 1. 
After recruiting the households for the program, you assigned them to treatment and control groups. 
Treatment homes received the retrofits on the first of the month and control homes did not have any work 
done. 
1 Python 
1. Check for balance between the treatment and control groups using Python. Create a table that displays 
each variable’s sample mean, sample standard deviation, and p-values for the two-way t-test between 
treatment and control group means. Your table should have four columns: one with variable names, one 
with sample mean and standard deviation for the control group, one with sample mean and standard 
deviation for the treatment group, and one with the p-value for the difference-in-means test. Does it 
appear that the randomization worked? If so, what can we say about the simple difference-in-means 
estimate? 
2. Provide graphical evidence that the retrofits worked. Plot kernel density plots of the electricity use for 
treated group and control group on the same graph using Python. Make sure to label the histogram 
appropriately. 
3. Suppose you want to estimate the linear equation Y = βX + ε where Y is an n × 1 vector of the 
dependent variable, X is an n × p + 1 matrix of the predictor variables in table 1 and a column of 
ones, and ϵ is an n × 1 vector of unobserved random error. Use the following methods to estimate βˆ, 
presenting coefficients in a single table with a column for each estimation technique (note I am not 
requiring that you present confidence intervals): 
(a) OLS by hand. Use the Numpy package in Python to create an array X that is the n×p+ 1 matrix 
of the predictor variables in table 1 and a column of ones and an array Y that is the n × 1 vector 
of the dependent variable. Use matrix operations to calculate βˆ. Recall that βˆ = (X′X)−1X′Y 
is the closed-form solution to the least-squares minimization problem. 
(b) OLS by simulated least squares. Use the Scipy.optimize.minimize() function in Python to nu
merically minimize the sum of squares objective function. Recall that the sum of squares is 
P n
i=1 (yi − βxi)2 where yi and xi are (1 × 1) and (1 × p + 1) vectors respectively. 
(c) OLS using a canned routine. Use the StatsModels package in Python using the OLS routine. 
12 Stata 
1. Check for balance between the treatment and control groups using Stata. Create a table that dis
plays each variable’s sample mean, sample standard deviation, and p-values for the two-way t-test 
between treatment and control group means. Your table should have four columns: one with vari
able names, one with sample mean and standard deviation for the control group, one with sample 
mean and standard deviation for the treatment group, and one with the p-value for the difference-in
means test. Hint: https://www.statalist.org/forums/forum/general-stata-discussion/general/1519721- 
summarized-statistics-table-with-t-test-for-difference-in-means contains useful code. 
2. Create a two-way scatterplot with electricity consumption on the y-axis and square feet on the x-axis 
using Stata’s twoway command. Make sure to label the axes. 
3. Estimate the same regression as in #3 above using Stata’s regress command, estimating heteroskedasticity
robust standard errors. Report the results in a new LaTeX table (including standard errors) using 
Stata’s outreg2 command. 
2

More products