$35
You may work together to help each other solve problems, but you should create your own solutions and hand in your own work without copying others’ work.
Data: ‘Sales.csv’
The data consist of sales prices for a sample of homes from a US city and some features of the houses.
Variables:
LAST_SALE_PRICE: the sale price of the home
SQFT: area of the house (sq. ft.)
LOT_SIZE: area of the lot (sq. ft.)
BEDS: number of bedrooms
BATHS: number of bathrooms
Calculate all pairwise correlations between all five variables.
Make a scatterplot of the sale price versus the area of the house. Describe the association between these two variables.
Fit a simple linear regression model (Model 1) with sale price as response variable and area of the house (SQFT) as predictor variable. State the estimated value of the intercept and the estimated coefficient for the area variable.
Write the equation that describes the relationship between the mean sale price and SQFT.
State the interpretation in words of the estimated intercept.
State the interpretation in words of the estimated coefficient for the area variable.
Add the LOT_SIZE variable to the linear regression model (Model 2). How did the estimated coefficient for the SQFT variable change?
State the interpretation of the coefficient of SQFT in Model 2.
Report the R-squared values from the two models. Explain why they are different.
Report the estimates of the error variances from the two models. Explain why they are different.
State the interpretation of the estimated error variance for Model 2.
Test the null hypothesis that the coefficient of the SQFT variable in Model 2 is equal to 0. (Assume that the assumptions required for the test are met.)
Test the null hypothesis that the coefficients of both the SQFT and LOT_SIZE variables are equal to 0. Report the test statistic.
What is the distribution of the test statistic under the null hypothesis (assuming model assumptions are met)?
Report the p-value for the test in Q13.