$35
Exercise 1: Cat Model
For this exercise we will use the cats dataset from the MASS package. You should use ?cats to learn about the background of this dataset.
part a
Fit the following simple linear regression model in R. Use heart weight as the response and body weight as the predictor.
Yi = β0 + β1xi + i
Store the results in an R object called cat_model. Run a summary of the model and report the following:
• The estimated value for β0, along with an interpretation
• The estimated value for β1, along with an interpretation
• The estimated value for σ (no interpretation needed)
• The estimated value for R2, along with an interpretation
Make sure that all interpretations include units and are given in the context of the problem.
# Use this code chunk for your answer. cat_model = lm(data = cats, Hwt ~ Bwt) summary(cat_model)
##
## Call:
## lm(formula = Hwt ~ Bwt, data = cats)
##
## Residuals:
## Min 1Q Median 3Q Max ## -3.5694 -0.9634 -0.0921 1.0426 5.1238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.3567 0.6923 -0.515 0.607
## Bwt 4.0341 0.2503 16.119 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.452 on 142 degrees of freedom ## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441
## F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16
part b
Does the estimate for β0 provide a meaningful value? Is it reliable? Explain.
part c
Calculate the estimated standard error for the estimated slope. That is, calculate the standard deviation for the sampling distribution of the estimated slope.
Then, compare this value to that reported in the R output from Exercise 1a.
# Use this code chunk for your answer.
x_bar = mean(cats$Bwt) sxx = sum((cats$Bwt - x_bar)ˆ2) sqrt((1.452ˆ2)/sxx)
## [1] 0.2501972
sqrt((1.452ˆ2)/sxx) - 0.2503
## [1] -0.0001028456
Exercise 2: Cat Inference
For this exercise, we will continue analyzing the model fit in Exercise 1.
part a
Let’s assume that for another animal, the relationship between body weight and heart weight is such that a 1 kg increase in body weight tends to result in a 3.25 g increase in heart weight, on average.
Might this same relationship be true for cats, or is there evidence that this relationship for cats is different? Use a t test to test:
• H0 : β1 = 3.25 • H1 : β1 = 36 .25
Report the following in your document outside the code block.
• The value of the updated test statistic
• Whether the p-value of the test will be less than 0.05, or greater than 0.05
• A statistical decision at α = 0.05
# Use this code chunk for your answer, if needed.
summary(cat_model)
##
## Call:
## lm(formula = Hwt ~ Bwt, data = cats)
##
## Residuals:
## Min 1Q Median 3Q Max ## -3.5694 -0.9634 -0.0921 1.0426 5.1238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.3567 0.6923 -0.515 0.607
## Bwt 4.0341 0.2503 16.119 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.452 on 142 degrees of freedom ## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441 ## F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16
t_stat = (4.0341 - 3.25)/0.2503 t_stat
## [1] 3.132641
part b
Compute a 70% confidence interval for β0. (You may use the confint function for this assignment.) Then, based on this confidence interval, anticipate the decision of the following two hypothesis tests:
• H_0: β0 = 0 vs.H_1: β0 =6 0
• H_0: β0 = -1 vs. H_1: β0 =6 -1
Explain your reasoning.
# Use this code chunk for your answer.
confint(cat_model, parm = '(Intercept)', level = 0.7,)
## 15 % 85 %
## (Intercept) -1.076791 0.3634664
part c
The inference procedures from parts a and b being valid relies on some assumptions being met. First, write out the four assumptions.
part d
Which of these four assumptions cannot be checked with a plot?
part e
Create the two plots (it’s ok if you end up with 4 plots) that we can use to check our assumptions for the cats data. Interpret these two plots. Be sure to specify if the assumptions seem reasonable and describe what you see that supports your conclusions.
# Use this code chunk for your answer.
plot(cat_model)
Fitted values
Theoretical Quantiles lm(Hwt ~ Bwt)
Fitted values lm(Hwt ~ Bwt)
Leverage lm(Hwt ~ Bwt)
shapiro.test(resid(cat_model))
##
## Shapiro-Wilk normality test
##
## data: resid(cat_model)
## W = 0.9845, p-value = 0.1046
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
bptest(cat_model)
##
## studentized Breusch-Pagan test
##
## data: cat_model
## BP = 9.5294, df = 1, p-value = 0.002022
Exercise 3: Cat Model Predictions
Continue analyzing the cats dataset using the model generated in Exercise 1.
part a
Use a 99% confidence interval to estimate the mean heart weight for body weights of 2.1 kilograms as well as for 2.8 kilograms.
Which of the two intervals is wider? How might you use the variance formula for yˆ(x) to know beforehand which would be wider?
# Use this code chunk for your answer.
predict(cat_model, newdata = data.frame('Bwt' = c(2.1, 2.8)), interval = 'confidence', level = 0.99)
## fit lwr upr ## 1 8.114869 7.599225 8.630513 ## 2 10.938713 10.618796 11.258630
mean(cats$Bwt)
## [1] 2.723611
sd(cats$Bwt)
## [1] 0.4853066
.
part b
Use a 99% prediction interval to predict the heart weights for body weights of 4.2 kilograms. Interpret this interval in context. Do you trust this prediction interval? Explain why or why not.
# Use this code chunk for your answer.
predict(cat_model, newdata = data.frame('Bwt' = c(4.2)), interval = 'prediction', level = 0.99)
## fit lwr upr
## 1 16.5864 12.66088 20.51192
summary(cats$Bwt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.300 2.700 2.724 3.025 3.900
part c
A confidence interval for the mean heart weights is reported as: (13.3427, 14.1827). Determine the cat body weight that this confidence interval corresponds to.
# Use this code chunk, if needed, to perform any calculations.
bounds_avg = (13.3427 + 14.1827)/2
x_star = (bounds_avg - as.numeric(coef(cat_model)[1]))/as.numeric(coef(cat_model)[2]) x_star
## [1] 3.500035
part d
Suppose that I want to generate a confidence interval for the mean heart weights that is narrower than that described in part c. Describe two ways that I can create a narrower confidence interval.
Exercise 4: Simulations
Consider the model
Y = 4 + 0x + ε
with
ε ∼ N(µ = 0,σ2 = 36)
Before answering the following parts, set a seed value equal to your birthday, as was done in class (this time in the format yyyymmdd).
birthday = 20010216 set.seed(birthday)
part a
Repeat the process of simulating n = 125 observations from the above model 2500 times. Use the x created in the code chunk below for all iterations.
n = 125 x = runif(n, -2, 2) reps = 2500 beta0 = 4 beta1 = 0 sigma = 6
beta0hat = vector(length = reps) beta1hat = vector(length = reps)
for(i in 1:reps){
epsilon = rnorm(n, 0, sigma) y = beta0 + beta1 * x + epsilon mymodel = lm(y ~ x) beta0hat[i] = coef(mymodel)[1] ## change this line of code so 00 is now the estimated intercept beta1hat[i] = coef(mymodel)[2] ## change this line of code so 01 is now the estimated slope
}
ests = data.frame(cbind(beta0hat, beta1hat))
The general structure has been provided for you. The last two lines of the for loop have placeholder values of 00 and 01. Adjust these values that are saved at the end of the for loop to be the appropriate values for beta0hat and beta1hat.
part b
Plot a histogram of beta1hat values based on your simulation. Describe this histogram, including comments on the shape, center, spread, and outliers.
# Use this code chunk for your answer.
hist(beta1hat)
Histogram of beta1hat
beta1hat
part c
Import the skeptic.csv data (found on Canvas) and fit a SLR model. (Check the variable names in the skeptic dataset, as the names should indicate which to use as your y variable and which for your x variable). Print the estimated coefficient for β1.
# Use this code chunk for your answer.
setwd("~/Desktop/data") df = read.csv("skeptic.csv") lm1 = lm(y ~ x, data = df)
coef(lm1)[2]
## x
## 0.6895595
summary(lm1)
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max ## -15.4987 -4.1277 -0.1347 3.6636 15.3081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.7269 0.5430 8.705 1.73e-14 ***
## x 0.6896 0.4939 1.396 0.165
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.053 on 123 degrees of freedom
## Multiple R-squared: 0.0156, Adjusted R-squared: 0.007597 ## F-statistic: 1.949 on 1 and 123 DF, p-value: 0.1652 part d
Your goal for this part of the question is to determine if you think the skeptic data could reasonably have been simulated from the model described at the beginning of this exercise. I won’t tell you exactly how to do that, but I will leave the following hint:
If the skeptic data really was simulated from this model, then how unusual is the βˆ1 we found from the skeptic data? How can our simulated βˆ1 values help us quantify this answer?
# Use this code chunk for your answer.
summary(lm1)
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max ## -15.4987 -4.1277 -0.1347 3.6636 15.3081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.7269 0.5430 8.705 1.73e-14 ***
## x 0.6896 0.4939 1.396 0.165
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.053 on 123 degrees of freedom
## Multiple R-squared: 0.0156, Adjusted R-squared: 0.007597 ## F-statistic: 1.949 on 1 and 123 DF, p-value: 0.1652
test_stat2 = (0.6896 - 0)/0.4939 test_stat2
## [1] 1.396234 part e
Based on your investigation in part d, do you think the skeptic data could have been simulated from the model provided in part a?
Exercise 5: Formatting
The last five points of the assignment will be earned for properly formatting your final document. Check that you have:
• included your name on the document
• properly assigned pages to exercises on Gradescope
• selected page 1 (with your name) and this page for this exercise (Exercise 5)
• all code is printed and readable for each question
• all output is printed
• generated a pdf file