Starting from:

$30

MATH4330 Assignment 4 -Solved

The file reform.csv has a cross-sectional subsample from the German Socio-Economic Panel, which collected 
data on doctor visits before and after a major health care reform that took place in 1997. The reform increased 
the copayments for prescription drugs by up to 200% and imposed upper limits on the reimbursement of 
physicians by the state insurance. The outcome is the number of doctor visits in a three month period. The 
descriptions of the variables are: 
-id:The patient’s ID number 
-numvisit: Number of doctor visits in a 3-month period 
-reform: Before (reform=0) or after (reform=1) the reform 
-badh: Person is in bad health? (1=yes, 0=no) 
-age: Age in years 
-educ: Education in years 
-loginc: Logarithm of income 
(a) Fit a Poisson regression model (with all possible predictors included, other than id) to see 
whether the reform affected the number of doctor visits. Formulate the model so that the estimated 
rates are per 1-month of follow-up. Report the rate ratio for the reform variable and interpret. 
(b)  Calculate and interpret the rate ratio for age. 
(c) Estimate the 1-month rate of visits for a 30 year old person before the reform, with 12 years 
education, in bad health, and with an average income (so that loginc = 7.6989). Be sure to specify the 
proper units of the rate. 
(d) Estimate the 1-year rate for the person from part (c). 
(e) 2Compute the ratio of 1-month visit rates corresponding to a 10 year increase in age, assuming 
all other variables are held constant. 
1Solution 
4330 Assignment 4 R Code Ravish Kamath 213893664 
Solution 
Part A 
To fit a Poisson regression model with all the possible predictors included in the question, we will use the glm 
function. However we first need to deal with the offset situation. We will mutate our current reform data to 
have another column of the number 3 to represent the period of time for the number of doctor visits. This 
has been previously added however we can show the first 6 data rows look like 
head(reform) 
## id numvisit reform badh age educ loginc period 
## 1 3 1 0 0 45 10.5 7.636776 3 
## 2 4 9 0 1 53 9.0 7.699212 3 
## 3 7 40 0 1 48 10.5 7.057358 3 
## 4 12 0 1 0 52 18.0 7.688554 3 
## 5 22 1 1 0 42 10.5 7.331879 3 
## 6 26 0 0 1 57 10.5 7.428922 3 
Now we can run our Possion regression as shown below our code. 
fit <- glm(numvisit ~ reform + badh + age + educ + loginc + offset(log(period)), 
data= reform, family = poisson) 
summary(fit)$coefficients 
## Estimate Std. Error z value Pr(>|z|) 
## (Intercept) -1.2728170897 0.316772037 -4.01808538 5.867294e-05 
## reform -0.2273629035 0.031526364 -7.21183392 5.520328e-13 
## badh 1.1726351709 0.035255216 33.26132449 1.400250e-242 
## age 0.0049815098 0.001473348 3.38108219 7.220094e-04 
## educ -0.0006805681 0.006873469 -0.09901377 9.211273e-01 
## loginc 0.1119395623 0.042706815 2.62111711 8.764215e-03 
The reported rate ratio will be the exponential of the reform coefficient in our model named fit. 
RR = exp(fit$coefficients[2]) 
RR 
## reform 
## 0.7966316 
The rate of number of doctor visits after the health care reform that took place in 1997 is only 79% as high 
as the rate when the reform was not in effect. 
Part B 
To calculate the rate ratio for age, we will once again take the exponential of the age coefficient in our 
model named fit. 
RR = exp(fit$coefficients[4]) 
RR 
## age 
## 1.004994 
To interpret this value,the expected number of doctor visits, as age increases by 1 year, would increase by 
0.4%. 
2Solution 
4330 Assignment 4 R Code Ravish Kamath 213893664 
Part C 
We will now estimate the 1-month rate of visits for a 30 year old individual with 12 years of education, bad 
health, and the log incomce of 7.6989. 
ndat <- data.frame(age = 30, reform = 0, badh = 1, educ = 12, loginc = 7.6989, 
period = 1) 
# Predict with type=response will give the estimated rate. 
pred.rate <- predict(fit, newdata = ndat, type="response") 
pred.rate 
## 1 
## 2.466766 
To conclude, the estimated number of doctor visits for that specific individual would be between 2-3 visits 
per month. 
Part D 
We will estimate the 1-year rate for the same individual that was estimated in part (c). 
ndat <- data.frame(age = 30, reform = 0, badh = 1, educ = 12, loginc = 7.6989, 
period = 12) 
pred.rate1 <- predict(fit, newdata = ndat, type="response") 
pred.rate1 
## 1 
## 29.60119 
Hence, the estimated number of doctor visits for that specific individual would be between 29-30 visits per 
year. 
Part E 
exp(10*fit$coefficients[4]) 
## age 
## 1.051077 
34330 Assignment 4 R Code Ravish Kamath 213893664 
Question 2 
Assume that you have an outcome variable Yi and predictor variable xi for i = 1, ..., n, with independent 
observations. Furthermore, assume that you have the following model: 
Yi|xi ∼ Poission(λi) 
with 
log(λi) = βxi 
(a) [3 Points] Find the log-likelihood ℓ(β|yi). 
(b) [5 Points] Assume you want to find the maximum likelihood estimate of β. Derive the Newton-Raphson 
update for β(k+1) given you have β(k). 
Solution 
Part A 
Let the likelihood function be: 
L(β|yi) = 
n
Y
i=1 
e−
λ
λ
yi 
y
i

Then the log-likelihood will be: 
ℓ(β|yi) = 
n
X
i=1 
log(eλiλy


) − log(yi !) 

n
X
i=1 
−λi + 
n
X
i=1 
yi log(λi) − 
n
X
i=1 
log(yi !) 
Now log(λi) = βxi and λi = e βxi 
Hence we get the log-likelihood to be: 
ℓ(β|yi) = 
n
X
i=1 
−e βxi + 
n
X
i=1 
yiβxi − 
n
X
i=1 
log(yi !) 
4Solution 
4330 Assignment 4 R Code Ravish Kamath 213893664 
Part B 
∂ℓ(β|yi) 
∂β 

n
X
i=1 
−xie βxi + 
n
X
i=1 
yixi 

n
X
i=1 
xi(yi − e βxi ) 
= ℓ′(β|yi) 
∂2ℓ(β|yi) 
∂β2 
= − 
n
X
i=1 
x2i e βxi 
= ℓ ′′(β|yi) 
Hence the Newton-Raphson method will be as follows: 
β(k+1) = βk − ℓ
′(
β
|
y
i

ℓ 
′′
(
β
|
y
i

= βk + P n
i=1 
xi(yi 
− 

βxi ) 

n
i=1 x
2


βx

54330 Assignment 4 R Code Ravish Kamath 213893664 
Question 3 
Refer to the model in Question 2. Use the optim() function in R to find the maximum-likelihood 
estimate of β. Test the function using data from reform.csv with numvisit as Yi and age as xi . Don’t use an 
offset for this part. 
Solution 
fit = glm(numvisit ~ age - 1, data = reform, family = poisson) 
summary(fit)$coefficients 
## Estimate Std. Error z value Pr(>|z|) 
## age 0.02558795 0.0003843192 66.57995 0 
yi = reform$numvisit 
xi = reform$age 
log.lik.pr = function(par){ 
b = par[1] 
lam = exp(b*xi) 
-sum(dpois(yi, lambda = lam, log = TRUE)) 

opt.pr = optim(par = list(b = 1), fn = log.lik.pr, method = "Brent", 
lower=-10, upper=10) 
opt.pr$par 
## [1] 0.02558795 
As we can see in both the outputs, we arrive to the same β value, when using either the glm function or the 
Newton-Raphson algorithm. 
64330 Assignment 4 R Code Ravish Kamath 213893664 
Question 4 
The file adult data clean.csv contains education, demographic, and in- come information from the US census 
database as of 1994. The three variables of interest are: 
-education: The education level (HS-grad, Bachelors, Masters, Doctorate) 
-age: The age of the individual 
-sex: The sex of the individual 
You should run the following code to make sure that HS-grad is the reference category for education, assuming 
that you read the csv file into a data object called a.data: 
a.data$education <- factor(a.data$education) 
a.data$education <- relevel(a.data$education, ref=“HS-grad”) 
(a)  Run a multinomial logistic regression model with education as the out- come and age and 
sex as predictors. Make sure that HS-grad is the reference category for the outcome. 
(b) [6 Points] Using odds ratios calculated from the model fit, describe the effects of age and sex on the 
odds of an individual having a Bachelors, Masters, and Doctorate (each compared to the reference 
category). 
Solution 
Part A 
To run the multinomial regression, we will be using the nnet package. Our reference category will be HS-grad. 
When running the model we get: 
fit = 
multinom(education~ age + sex, data = ad_ed) 
## # weights: 16 (9 variable) 
## initial value 24942.208145 
## iter 10 value 18781.483905 
## iter 20 value 17540.685828 
## final value 17540.685020 
## converged 
fit 
## Call: 
## multinom(formula = education ~ age + sex, data = ad_ed) 
## 
## Coefficients: 
## (Intercept) age sexMale 
## Bachelors -0.7131205 -0.0006913149 0.096781173 
## Doctorate -5.6204287 0.0462546638 0.528634166 
## Masters -3.0003569 0.0286641455 0.007771833 
## 
## Residual Deviance: 35081.37 
## AIC: 35099.37 
7Solution 
4330 Assignment 4 R Code Ravish Kamath 213893664 
Part B 
Let us first make a few tables to indicate each category of education vs. a HS-Grad. 
8Solution 
4330 Assignment 4 R Code Ravish Kamath 213893664 
Odds of an individual with a bachelors vs. high school graduate: 
Age: e−0.007 = 0.9930 
The odds of having a bachelors vs. a high school graduate is 0.9930 given that age differs by 1 year and 
their sex is constant. 
Sex: e0.09678 = 1.1016 
The odds of having a bachelors vs. a high school graduate is 0.9930 given that the sex of the individual 
is male and their age is constant. 
Odds of an individual with a masters vs. high school graduate: 
Age: e0.0287 = 1.0291 
The odds of having a masters vs. a high school graduate is 1.0291 given that age differs by 1 year and 
their sex is constant. 
Sex: e0.0078 = 1.0078 
The odds of having a masters vs. a high school graduate is 1.0078 given that the sex of the individual is 
male and their age is constant. 
Odds of an individual with a doctorate vs. high school graduate: 
Age: e0.0463 = 1.0474 
The odds of having a doctorate vs. a high school graduate is 1.0474 given that age differs by 1 year and 
their sex is constant. 
Sex: e0.5286 = 1.6966 
The odds of having a doctorate vs. a high school graduate is 1.6966 given that the sex of the individual 
is male and their age is constant. 
9

More products