Starting from:

$25

STAT292- Assignment 5 Logistic Regression Solved


1.   Table 1 presents a subset of data collected by V¨aisa¨nen and Ja¨rvinen (1977) on birdspecies in the Krunnit Islands archipelago of Finland. In particular, they reported on the bird species found on each of the islands in 1949 and how many of those bird species were extinct by 1970. It is of interest to understand whether the area of the island (in km2) is associated with species’ survival. The data corresponding to Table 1 are available in the Excel file Extinction.xlsx.

 
 
Extinct?
Island
Area (X)
Yes
No
Ulkokrunni
185.80
5
70
Maakrunni
105.80
3
64
Ristikari
30.70
10
56
Isonkivenletto
8.50
6
45
Hietakraasukka
4.80
3
25
Kraasukka
4.50
4
16
La¨nsiletto
4.30
8
35
Table 1: Extinction of bird species from 1949 to 1970 on seven islands in the Krunnit Islands archipelago, Finland. Fit the logistic regression model



where X denotes island area and p(X) denotes the probability of extinction.

Figure 1 shows relevant SAS output for the logistic regression model.

(a)    Carry out an appropriate goodness-of-fit test to determine whether the modelprovides a good fit to the data. State the hypotheses, and give the test statistic and the p-value of the test. What do you conclude at the α = 0.05 significance level?

(b)   Give estimates of β0 and β1 (up to 5dp).

(c)    Interpret the association between island area and extinction using the odds ratio. Demonstrate how the odds ratio is calculated from Figure 1. Additionally, provide a 95% confidence interval for the odds ratio.



Figure 1: Summary output for the logistic regression model log . (d) Find the predicted probability of extinction for an island with an area of 50 km2 (to 4dp).

(e)   Find the fitted count of extinct bird species on the island of Ulkokrunni (to 2dp). Also find the fitted count of non-extinct bird species on Ulkokrunni (to 2dp).

(f)    Test

H0 : β1 = 0

H1 : β1 6= 0

using the Wald statistic. Give the test statistic and the p-value of the test. What do you conclude at the α = 0.05 significance level?

2.   Consider data reported by Gilbert (1981) on the relationship between pre-maritalsex (i.e., sexual intercourse before marriage), extra-marital sex (i.e., sexual intercourse with someone other than a spouse whilst married), and whether the person had been divorced for a random sample of heterosexual men and women who had been married at least once. These data are presented in Table 2 and are available in the Excel file Divorce.xlsx.



                              Gender    Pre-marital    Extra-marital     Divorced? (Z)

                                 (W)           Sex (X)             Sex (Y )          No         Yes





Table 2: Data on reported pre-marital sex, extra-marital sex, and divorce for a random sample of heterosexual men and women.

First, use the backward model selection method to find the simplest model that provides a good fit to the data. Start from the following model, which we will denote by M2,

,

where pijk is the probability of divorce when the gender (W) is at level i, pre-marital sex status (X) is at level j, and extra-marital sex status (Y ) is at level k.

Figure 2 shows relevant summary output from SAS.

(a)    Is model M2 a saturated model? Why or why not?

(b)   What information does Step 1 provide in the SAS output? Write down the test hypotheses. What do you conclude?



Figure 2: Summary output for the backward selection method applied to the logit model

.

(c)    What is the final model?

Now consider the logit model, which we will denote by M1,

.

which uses a reference level parametrisation for all factors.

Figure 3 shows relevant summary output from SAS.

(d)   Carry out an appropriate goodness-of-fit test to determine whether model M1 provides a good fit to the data. State the hypotheses, and give the test statistic and the p-value of the test. What do you conclude at the α = 0.05 significance level?

(e)    Compare the odds of divorce for men with the odds of divorce for women usingan odds ratio, and interpret this odds ratio. Give a 95% confidence interval for the odds ratio.



Figure 3: Summary output for the logit model log .

More products