This assignment relates to the ‘Real-estate’ data set. People buying or selling houses would like to know how much they can expect to get, or pay, for a property. This is also a concern for those who are making mortgage loans, or for those taxing real estate (and who are more likely to commission statistical studies than individual home-owners). The price of a house depends on its physical characteristics, including size, features, quality of construction, age, etc. It also depends on location, and current market characteristics. You are approached by a research group which has a data on a sample of residential sales in a midwestern city; the variables are described in Table below.
1. Read the data into R. Call the loaded data “real.estate”.
2. Answer the following sub-questions
i) Use the “summary()” function to identify the types of variables. Which variables are categorical? Which variables are quantitative? Are there any concerns in the summary table? Explain.
ii) Use the “pairs()” or “gpairs()” function to produce a scatterplot matrix of the first ten columns or variables of the data. Recall that you can reference the first then columns of a matrix A using A [,1:10]. Is there any interesting patterns? Which variables seem associated with the sales price? Explain.
iii) Use the “as.factor” function to regenerate categorical variables.
3. Fit the models and address the following when building that model:
i) Fit the null model and the full model
ii) Find the best sets of predictors using the stepwise procedures.
iii) Find the best sets of predictors using the best subset approach.
iv) Considering the models in part (ii) and part(iii), choose the best model.
v) Interpret the coefficients of the best model in this context.
vi) Evaluate the best model.
vii) Check the assumptions using “plot()” function.
viii) Continue exploring the data, and provide a brief summary of what you discover.