Starting from:

$30

MATH4330 Assignment 2 -Solved

This data set is from the Duke University Cardiovascular Disease Databank and consists of 2258 patients 
and 6 variables. The patients were referred to Duke University Medical Center for chest pain. The variables 
included in the data set acath2.csv are the following: 
• sex: sex of the patient (0=male, 1=female) 
• age: age of the patient 
• cad.dur: duration of symptoms of coronary artery disease 
• cholest: cholesterol (in mg) 
• sigdz: significant coronary disease by cardiac catheterization (definied as ≥ 75$ diameter narrowing in 
at least one important coronry artery - 1 = yes, 0 = no) 
• tvdm: severe coronary disease (definied as three vessel or left main disease by cardiac catheterization - 
1 = yes, 0 = no)) 
(a) In R create a new vector that dichotomizes cholest into high and low, where the cutoff 
is the median of cholest. Calculate the odds ratio for significant coronary disease based on high/low 
cholesterol. Interpret the odds ratio. 
(b) Do the same as part (a), but use severe coronary disease instead of significant coronary 
disease. 
(c)  Do you think you could estimate the risk ratio for significant or severe coronary disease in 
this example? If yes, then estimate the risk ratios for the relationships investigated in part (a) and (b). 
If not, say why. In either case justify your answer. 
Solution 
Part A 
summary(A2df) 
## sex age cad.dur cholest 
## Min. :0.0000 Min. :17.00 Min. : 0.00 Min. : 29.0 
## 1st Qu.:0.0000 1st Qu.:45.00 1st Qu.: 6.00 1st Qu.:196.0 
## Median :0.0000 Median :51.00 Median : 19.00 Median :224.5 
## Mean :0.3051 Mean :50.82 Mean : 41.91 Mean :229.9 
## 3rd Qu.:1.0000 3rd Qu.:57.00 3rd Qu.: 58.00 3rd Qu.:259.0 
## Max. :1.0000 Max. :81.00 Max. :416.00 Max. :576.0 
## sigdz tvdlm 
## Min. :0.0000 Min. :0.0000 
## 1st Qu.:0.0000 1st Qu.:0.0000 
## Median :1.0000 Median :0.0000 
1Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
## Mean :0.6599 Mean :0.3202 
## 3rd Qu.:1.0000 3rd Qu.:1.0000 
## Max. :1.0000 Max. :1.0000 
Observing our summary of the data set, clearly the median of the variable, cholest, would be 224.5. 
A2df$cholest = dicho(A2df$cholest, dich.by = 'median') 
summary(A2df$cholest) 
## 0 1 
## 1129 1129 
Since we are splitting it based of its median, we can see based of the summary there are 1129 in both levels. 
0 = cholesterol level below 224.5, and 1 = cholesterol level is above 224.5. 
Now let us proceed with the calculation of the odds ratio for significant coronary disease based on high/low 
cholesterol. 
odds_tb = table(A2df$cholest, A2df$sigdz) 
colnames(odds_tb) = c("non-signficant coronary", "significant coronary") 
rownames(odds_tb) = c('chol < 224.5', 'chol > 224.5') 
odds_tb 
## 
## non-signficant coronary significant coronary 
## chol < 224.5 455 674 
## chol > 224.5 313 816 
n = sum(odds_tb) 
chol_great= sum(odds_tb[2,]) 
chol_low = sum(odds_tb[1,]) 
#P(significant coronary disease | high cholesterol) 
sigcoron_high_prob = odds_tb[2,2]/ chol_great 
#P(significant coronary disease | low cholesterol) 
sigcoron_low_prob = odds_tb[1,2]/ chol_low 
odds_ratio = (sigcoron_high_prob/(1 - sigcoron_high_prob))/ 
(sigcoron_low_prob/(1 - sigcoron_low_prob)) 
odds_ratio 
## [1] 1.759938 
With the odds ratio being greater than one, the individuals with higher cholesterol will have a higher odds of 
having significant coronary disease. 
2Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Part B 
Here is the calculation of the odds ratio for severe coronary disease based on high/low cholesterol. 
odds_tb = table(A2df$cholest, A2df$tvdlm) 
colnames(odds_tb) = c("non-severe coronary", "severe coronary") 
rownames(odds_tb) = c('chol < 224.5', 'chol > 224.5') 
odds_tb 
## 
## non-severe coronary severe coronary 
## chol < 224.5 810 319 
## chol > 224.5 725 404 
chol_great= sum(odds_tb[2,]) 
chol_low = sum(odds_tb[1,]) 
#P(severe coronary disease | high cholesterol) 
sevcoron_high_prob = odds_tb[2,2]/ chol_great 
#P(severe coronary disease | low cholesterol) 
sevcoron_low_prob = odds_tb[1,2]/ chol_low 
odds_ratio = (sevcoron_high_prob/(1 - sevcoron_high_prob))/ 
(sevcoron_low_prob/(1 - sevcoron_low_prob)) 
odds_ratio 
## [1] 1.414939 
With the odds ratio being greater than one, the individuals with higher cholesterol will have a higher odds of 
having severe coronary disease. 
Part C 
We cannot estimate the risk ratio for either significant nor sever coronary disease. It is clear that this is 
a retrospective study/case control since we are already sampling relative to the outcome, which would be 
coronary heart disease. Furthermore, a quick Google search shows us that coronary heart disease is quite 
common, and hence the probability of coronary heart disease is greater than 5%. Hence we cannot use the 
odds ratio to estimate the risk ratio. 
34630 Assignment 1 R Code Ravish Kamath 213893664 
Question 2 
Use the same data set as in Question 1. Run a linear regression model to investigate the joint effect of age 
and sex on cholesterol (mg). 
(a) [3 points] Write out the fitted regression model based on the R output. Interpret all the estimated β 
parameters in the model. 
(b) [2 points] Calculate and report 95% confidence intervals for the coefficients of age and sex. Interpret. 
(c) [2 points] Predict the cholesterol of a 50-year-old female. 
(d) [2 points] Predict the cholesterol of a 10-year-old male. Are you less confident in this prediction than 
the one you made in part (c)? Why? 
Solution 
Part A 
fit <- lm(A2df$cholest~ age + sex, data=A2df) 
s.fit <- summary(fit) 
s.fit 
## 
## Call: 
## lm(formula = A2df$cholest ~ age + sex, data = A2df) 
## 
## Residuals: 
## Min 1Q Median 3Q Max 
## -208.28 -34.10 -4.92 28.47 339.19 
## 
## Coefficients: 
## Estimate Std. Error t value Pr(>|t|) 
## (Intercept) 224.96743 5.83456 38.558 < 2e-16 *** 
## age 0.03886 0.11307 0.344 0.731 
## sex 9.78549 2.31139 4.234 2.39e-05 *** 
## --- 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 50.43 on 2255 degrees of freedom 
## Multiple R-squared: 0.008077, Adjusted R-squared: 0.007198 
## F-statistic: 9.182 on 2 and 2255 DF, p-value: 0.0001068 
Our model will be: 
cholesterol = 224.96743 + 0.03886Age + 9.78549Sex 
.
β0 interpretation: Given that the age of the individual is 0 year old and male,their cholesterol level will be 
224.96743 
β1 interpretation: Given that sex is a constant, for every increase in unit of Age, the cholesterol level would 
increase by 0.03886mg. 
β2 interpretation: Given that age is constant, then cholesterol level would increase by 9.78549 if they are 
female. 
4Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Part B 
confint(fit, level=0.95) 
## 2.5 % 97.5 % 
## (Intercept) 213.5257660 236.4090892 
## age -0.1828807 0.2605961 
## sex 5.2528151 14.3181654 
Age C.I. 
Our confidence interval for α = 0.05 would be (-0.1828807, 0.2605961). 
Sex C.I. 
Our confidence interval for α = 0.05 would be (5.258151, 14.3181654). 
Part C 
new.x <- data.frame(age= 50, sex= 1) 
predict(fit, newdata = new.x) 
## 1 
## 236.6958 
Based of the model we would predict that their cholesterol level would be 236.6958mg for a 50 year old female. 
Part D 
Based of the model we would predict that their cholesterol level would be 225.356 for a 10 year old male. 
Yes I would be less confident in this prediction. Firstly, most of these patients are experiencing chest 
pain, which would be common within older people, rather than someone who is 10 year’s old. Younger people 
tend to have a much healthier heart do you to younger age. To have a 225mg cholesterol level, would seem 
too unrealistic for that individual, unless they have an obesity issue. 
5

More products