Starting from:

$30

CSCI E-106:Assignment 6 Solved

CSCI E-106:Assignment 6


Instructions

Students should submit their reports on Canvas. The report needs to clearly state what question is being solved, step-by-step walk-through solutions, and final answers clearly indicated. Please solve by hand where appropriate.

Please submit two files: (1) a R Markdown file (.Rmd extension) and (2) a PDF document, word, or html generated using knitr for the .Rmd file submitted in (1) where appropriate. Please, use RStudio Cloud for your solutions.



Problem 1
Refer to Commercial properties data set. A commercial real estate company evaluates vacancy rates, square footage, rental rates, and operating expenses for commercial properties in a large metropolitan area in order to provide clients with quantitative information upon which to make rental decisions.The variables in the data set are the age (X1), operating expenses and taxes (X2), vacancy rates (X3), total square footage

(X4),and rental rates (Y). (35 points, 5 points each)

a-) Obtain the scatter plot matrix and the correlation matrix. Interpret these and state your principal findings. b-) Fit regression model for four predictor variables to the data. State the estimated regression function.

c-) Obtain the residuals and prepare a box plot of the residuals. Does the distribution appear to be fairly symmetrical?

d-)Conduct the Breusch-Pagan test for constancy of the elTor varhmce, assuming log σi2=γ0+γ1X1+γ2X2+γ3X3+γ4X4 use α = .01. State the alternatives, decision rule, and conclusion e-) Obtain QQ plot and error vs. fitted values, and comment on the graphs.

f-) Estimate β0, β1, β2, β3 and β4 jointly by the Bonferroni procedure, using a 95 percent family confidence coefficient. Interpret your results. g-) X1=5, X2=8.25, X3=0 and X4=250000, calculate the predicted rental rate and 95% confidence interval

Problem 2
Refer to the CDI data set. You have been asked to evaluate two alternative models for predicting the number of active ve physicians (Y) in a CDI. Proposed model I includes as predictor variables total population (X1), land area (X2), and total personal income (X3). Proposed model II includes as predictor variables population density (X1, total population divided by land area), percent of population greater than 64 years old (X2), and total personal income (X3).(40 points, 10 points each)

1

a-) Obtain the scatter plot matrix and the correlation matrix for each proposed model. Summarize the information provided. b-) For each proposed model, fit the first-order regression model with three predictor variables. c-) Calculate R2 for each model. Is one model clearly preferable in terms of this measure?

d-) For each model, obtain the residuals and plot them against Yˆ, each of the three predictor variables. Also prepare a normal probability plot for each of the two fitted models. Interpret your plots and state your findings. Is one model clearly preferable in terms of appropriateness?

Problem 3
Refer to Grocery retailer data set.A large, national grocery retailer tracks productivity and costs of its facilities closely. Each data point for each variable represents one week of activity. The variables included are the number of cases shipped (X1),the indirect costs of the total labor hours as a percentage (X2), a qualitative predictor called holiday that is coded 1 if the week has a holiday and 0 otherwise (X3), and the total labor hours (Y). (25 points, 5 points each)

a-) Fit regression model to the data for three predictor variables. State the estimated regression function. How are b1, b2 , and b3 interpreted here? (5 points) b-)Prepare a time plot of the residuals. Is there any indication of that the error terms are correlated?

c-) Obtain the analysis of variance table that decomposes the regression sum of squares into extra sums of squares associated with X1; with X3 , given X1; and with X2 , given X1, and X3.

d-) Test whether X2 can be dropped from the regression model given that X1, and X3 are retained. Use the F* test statistic and α = .05. State the alternatives, decision rule, and conclusion. What is the P-value of the test?

e-) Does SSR(X1)+SSR(X2/X1) equal SSR(X2)+SSR(X1/X2) here? Must this always be the case? (5

points)

2

More products