Starting from:

$25

DSC423 - Data Analysis and Regression - Assignment 02 - Solved

1a)“R-squared value is 0.69” represents that 69% of the variability in weight is explained by the model. Meanwhile, 0.69 is lower than 0.7, which means that is a relatively lower R-squared value. The performance of the model may be poor, in that case we may need more judgment conditions.

There is no double that, for statistics, R-squared value is very important. It explains how well the explanation power of the regression model (with the scale of 0 - 100%); In other words, we can say, how well the regression model fits the dataset.

In that case, the greater the R-squared value, the better performance it may have. But this is not the only condition that determines the quality of the model. For example, we still need more technical data like residual plots to determine whether it is a biased model.

 

1b)

The regression fallacy is a fallacy that presupposed someone/something has done the corrections on someone/something when there is an error, and the result is back to normal.

It lacks the explanation and consideration of natural variables.

For example, Joe had a fever one week after got the COVID vaccination. It could be the side effect of the vaccine, but it could also be catching cold or flu.

Another example, she was drowsy after taking melatonin and pain reliever. It is difficult to explain whether it was due to melatonin or painkillers that caused the drowsiness.

 

Citation:

Wikimedia Foundation. (2021, May 11). Regression fallacy. Wikipedia. https://en.wikipedia.org/wiki/Regression_fallacy.

 

 

【Question 2】

2a)

Y1 <- QUASAR$RFEWIDTH

X1 <- QUASAR$REDSHIFT

X2 <- QUASAR$LINEFLUX

X3 <- QUASAR$LUMINOSITY

X4 <- QUASAR$AB1450

X5 <- QUASAR$ABSMAG

 

model1 <- lm(Y1 ~ X1)

summary(model1)

 

model2 <- lm(Y1 ~ X2) summary(model2)

 

model3 <- lm(Y1 ~ X3) summary(model3)

 

model4 <- lm(Y1 ~ X4)

summary(model4)

 

model5 <- lm(Y1 ~ X5)

summary(model5)

 

2b) <1> model1 <- lm(Y1 ~ X1) summary(model1)

  

This model represents a regression relationship between Y1(Rest frame Equivalent Width) and X1(REDSHIFT).

The intercept of this model is 112.115.

Based on R-squared, we can say, 0.5073% of the variability in weight is explained by the model. Since it represents coefficient of determination, and the R2 value is lower than 0.70, we do not consider it as a good model.

 

<2> model2 <- lm(Y1 ~ X2) summary(model2)

  

This model represents a regression relationship between Y1(Rest frame Equivalent Width) and X2(Line Flux).

The intercept of this model is 665.77.

Based on R-squared, we can say, 4.365% of the variability in weight is explained by the model. Since it represents coefficient of determination, and the R2 value is lower than 0.70, we do not consider it as a good model.

 

<3> model3 <- lm(Y1 ~ X3) summary(model3)

  

This model represents a regression relationship between Y1(Rest frame Equivalent Width) and X3(Line Luminosity).

The intercept of this model is -1978.21.

Based on R-squared, we can say, 3.611% of the variability in weight is explained by the model. Since it represents coefficient of determination, and the R2 value is lower than 0.70, we do not consider it as a good model.

 

<4> model4 <- lm(Y1 ~ X4) summary(model4)

  

This model represents a regression relationship between Y1(Rest frame Equivalent Width) and X4(AB1450 Magnitude).

The intercept of this model is -667.31.

Based on R-squared, we can say, 30.24% of the variability in weight is explained by the model. Since it represents coefficient of determination, and the R2 value is higher than 0.70, we consider it as a good model.

 

<5> model5 <- lm(Y1 ~ X5) summary(model5)

  

This model represents a regression relationship between Y1(Rest frame Equivalent Width) and X5(Absolute Magnitude). The intercept of this model is 1263.64.

Based on R-squared, we can say, 37.24% of the variability in weight is explained by the model. Since it represents coefficient of determination, and the R2 value is higher than 0.70, we consider it as a good model.

 

2c)

Model 5 (Y1: Rest frame Equivalent Width ~ X5: Absolute Magnitude) is the best models. It has relatively the highest R-squared value among all models and has relatively lower pvalue. Thus, we consider it as the best models among all the others.

More products