Starting from:

$25

STAT292- Assignment 4 Linear Regression Solved

There are four questions, worth a total of 100 marks. Question 1 starts on page 2.

Assignment Guidelines (once more)
You are encouraged to discuss assignments with other students, but your submitted work must be your own.

The following Assignment Guidelines are helpful for all the assignments in Parts 2 and 3 of the course.

When you carry out a statistical test of hypothesis, you should state the following, when relevant:

•   Model equation.

•   Assumptions about the data, and comments about whether diagnostic graphs support those assumptions.

•   Null and alternative hypotheses.

•   ANOVA Table (if relevant), p-values.

•   Statistical conclusions. For example, “We reject H0 and conclude HA, that µ1 and µ2 differ at the 5% significance level”.

•   Interpretation of the statistical conclusions back to the original problem, using the original meaning of the response variable and any factors or covariates. For example, if comparing heights of two groups, “Female and male adults have different mean heights, with males being taller on average”.

Assignment Guidelines

1. Comprehension Test
Children in a school class are given a test of comprehension of English, marked out of 100. The children are from three different ethnic groups, which is thought to be an important factor. The question of interest is whether there are sex differences after allowing for ethnicity. The data follow:

 
Females
Males
Ethnic group E1 E2

E3
67 66 75 76 71 70 72
63 72 62 61 69 64 71 68 56
69 57 55 63 65 55
59 47 49
30 47
39 33
(a)    A two-way ANOVA was run on the data, with SAS output given on pages 3 to 6. Present the results from the ANOVA following the usual Assignment Guidelines, as given on page 1.

(b)   If a one-way ANOVA is done with factor Sex, the resulting ANOVA table is:

Source
DF
Sum of Squares
Mean Square
F value
p-value
Sex
1
144.166
144.166
0.99
0.3292
Error
27
3942.662
146.025
 
 
Total
28
4086.828
 
 
 
Briefly discuss the outcomes of the separate tests for Sex presented in parts (a) and (b). Are the conclusions different? Give reasons to explain your answer.

                                        SAS Output for Comprehension Test 
 

Linear Models 

 

The GLM Procedure

 

Class Level Information
Class
Levels
 Values
Ethnicity
3
 E1 E2 E3
Sex
2
 F M
 

Number of Observations Read
 29
Number of Observations Used
29
 

 

Dependent Variable: Comprehension  
 
 

Source
DF
 Sum of Squares
Mean Square
 F Value
Pr F
Model
5
    3365.438697
673.087739
 21.46
<.0001
Error
23
      721.388889
31.364734
   
 
Corrected Total
28
    4086.827586
 
   
 
 

R-Square
 Coeff Var
 Root MSE
Comprehension Mean
0.823484
 9.275400
 5.600423
60.37931
 

Source
DF
      Type I SS
 Mean Square
 F Value
Pr F
Ethnicity
2
 3060.640086
 1530.320043
 48.79
<.0001
Sex
1
 275.113176
 275.113176
     8.77
0.0070
Ethnicity*Sex
2
    29.685435
    14.842718
     0.47
0.6289
 

                                                                         

 

                                        SAS Output for Comprehension Test 
 

Linear Models 

 

The GLM Procedure

                                                                                     1

 

 

 

                                        SAS Output for Comprehension Test 
                                                                                  

 

 

 

                                        SAS Output for Comprehension Test 
 

 

 

Note: E1 is the top line, E2 the middle line and E3 the lowest. (The lines are different colours, but that doesn’t show up if viewed or printed in black and white.)

2. Invertebrates in Mussel Clumps
The following data are from Peake and Quinn (1993), Temporal variation in speciesarea curves for invertebrates in clumps of an intertidal mussel, Ecography 16, 269277. The two variables used in this question are:

x = log10(Area) of each of 25 mussel clumps (in dm2), and

Y = number of different species of macroinvertebrates in each clump.

Note: Using log(Area) gives a straighter regression line than Area, which is why it is used. This is a transformation of x, not Y ; it has been done to improve linearity, not to stabilise variances.

The data follow. Decide if there is a useful linear relationship between x and Y , i.e. if x is a useful linear predictor of Y .

Clump
logArea


Species
1
2.71
3
2
2.67
7
3
2.66
6
4
2.97
8
5
3.13
10
6
3.25
9
7
3.23
10
8
3.25
11
9
3.49
16
10
3.60
9
11
3.65
13
12
3.65
14
13
3.70
12
14
3.65
14
15
3.74
20
16
3.87
22
17
3.85
15
18
3.96
20
19
4.01
22
20
3.97
21
21
4.14
15
22
4.31
24
23
4.39
25
24
4.43
25
25
4.42
24


(a)    A scatterplot of the data is given on page 8. Give comments on whether youthink the plot shows (i) linearity, (ii) constant variance.

(b)   Output from a simple linear regression using logArea to predict the numberof species is given on pages 9 and 10. Present a report on this analysis that includes (as usual) the model equation, hypotheses, assumptions, comments on whether the analysis is valid, plus statistical conclusions and interpretation.

                                         SAS Output for Mussel Clumps
 

 

Scatter Plot 

 



 

 

                                         SAS Output for Mussel Clumps
 

Linear Regression Results 

 

The REG Procedure

 

Model: Linear_Regression_Model

 

Dependent Variable: Species 

 

Number of Observations Read
 25
Number of Observations Used
25
 

Analysis of Variance
Source
DF
Sum of

Squares
        Mean

      Square
 

 F Value
Pr F
Model
1
868.50179
 868.50179
 117.85
<.0001
Error
23
169.49821
    7.36949
   
 
Corrected Total
 24
1038.00000
             
   
 
 

Root MSE
2.71468
 R-Square
 0.8367
Dependent Mean
 15.00000
 Adj R-Sq
 0.8296
Coeff Var
18.09787
 
 
 

Parameter Estimates
Variable
DF
Parameter

 Estimate
 Standard

      Error
 

 t Value
 Pr |t|
Intercept
 1
 -25.64136
 3.78287
 -6.78
 <.0001
logArea
1
 11.20214
 1.03189
 10.86
 <.0001
 

 

 

                                         SAS Output for Mussel Clumps
 

Linear Regression Results 

 

 

 

 

 

 

3. Coarse Woody Debris in Lakes
Christensen et al. (1996, Ecological Applications 6(4), 1143-1149) studied the relationships between coarse woody debris (CWD), shoreline vegetation and lake development in a sample of 16 lakes in North America. Coarse woody debris is useful in providing a habitat for various fish species. It is known to be related to the riparian (river-bank, lake-edge) tree density, irrespective of whether or not humans are present. The objective is to find out whether, after allowing for riparian tree density, human habitation is having an effect on the CWD.

The variables below were taken around the shoreline and near-shore water:

L10CABIN = log10 of 1 + density of cabins (number km−1),

RIP.DENS = density of riparian trees (trees km−1), and CWD.BASA = basal area of coarse woody debris (m2 km−1).

LAKE
AREA
RIP.DENS
CWD.BASA
L10CABIN
Bay
69
1270
121
0
Bergner
9
1210
41
0
Crampton
24
1800
183
0
Long
8
1875
130
0
Roach
20
1300
127
0
Tenderfoot
175
2150
134
0.20412
Palmer
254
1330
65
0.462398
Street
22
964
52
0.6627578
Laura
240
961
12
0.7075702
Annabelle
85
1400
46
0.763428
Joyce
12
1280
54
0.845098
Lake hills
25
976
97
0.8864907
Towanda
58
771
1
1.10721
Black oak
234
833
4
1.1238516
Johnson
31
883
1
1.2552725
Arrowhead
40
956
4
1.40824
(a)    Let Y = CWD.BASA, X1 = RIP.DENS and X2 = L10CABIN. Plots of Y vs. X1, Y vs. X2 and X1 vs. X2 are given on page 12. Comment on any relationships you see.

(b)   SAS output for the following models is presented on pages 13 to 16. Diagnosticgraphs are shown for the last model.

i. Regression of Y on the predictor X1 ii. Regression of Y on the predictor X2

iii. Regression of Y on the two predictors X1 and X2

For each analysis above, present the model equation, hypotheses and conclusions. For the third analysis, comment on whether or not the model assumptions are satisfied.

(c)    Which of the hypothesis tests from the three presented models gives the answerto the question of interest in this situation? Explain the answer.



                                      L10CABIN                                                                                             RIP.DENS



L10CABIN

Scatterplots: CWD by L10CABIN, CWD by RIP.DENS, RIP.DENS by L10CABIN

                               Coarse Woody Debris        SAS Output
 

 

Linear Regression Results 
The REG Procedure

 

Model: Linear_Regression_Model

 

Dependent Variable: CWD.BASA 

 

 
Number of Observations Read
16
 
 
Number of Observations Used
16
 
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
1
32054
32054
24.30
0.0002
Error
14
18466
1318.96866
 
 
Corrected Total
 15
50520
 
 
 
 
Root MSE
36.31761
R-Square
0.6345
 
 
Dependent Mean
67.00000
Adj R-Sq
0.6084
 
 
Coeff Var
54.20539
 
 
 
 
Parameter Estimates
 
 
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr |t|
 
 
Intercept
1
-77.09908
30.60801
-2.52
0.0246
 
 
RIP.DENS
 1
0.11552
0.02343
4.93
0.0002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Linear Regression Results 
The REG Procedure

 

Model: Linear_Regression_Model

 

Dependent Variable: CWD.BASA 

 

 
Number of Observations Read
16
 
 
Number of Observations Used
16
 
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
1
32840
32840
26.00
0.0002
Error
14
17680
1262.86950
 
 
Corrected Total
 15
50520
 
 
 
 
Root MSE
35.53688
R-Square
0.6500
 
 
Dependent Mean
67.00000
Adj R-Sq
0.6250
 
 
Coeff Var
53.04011
 
 
 
 
Parameter Estimates
 
 
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr |t|
 
 
Intercept
1
121.96875
13.96871
8.73
<.0001
 
 
L10CABIN
 1
-93.30142
18.29646
-5.10
0.0002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

Linear Regression Results 
The REG Procedure

 

Model: Linear_Regression_Model

 

Dependent Variable: CWD.BASA 

 

 
Number of Observations Read
16
 
 
Number of Observations Used
16
 
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
2
38041
19020
19.81
0.0001
Error
13
12479
959.93185
 
 
Corrected Total
 15
50520
 
 
 
 
Root MSE
30.98277
R-Square
0.7530
 
 
Dependent Mean
67.00000
Adj R-Sq
0.7150
 
 
Coeff Var
46.24294
 
 
 
 
Parameter Estimates
 
 
Variable
DF
Parameter Estimate
Standard Error
t Value
Pr |t|
 
 
Intercept
1
18.16485
46.22822
0.39
0.7007
 
 
RIP.DENS
1
0.06572
0.02823
2.33
0.0367
 
 
L10CABIN
 1
-56.26481
22.53059
-2.50
0.0267
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 



 

 

4. Age of Teeth
In forensic work, scientists estimate the age of a skeleton by counting teeth cementum annulation (i.e. growth rings). Two teeth preparation methods, A and B, are compared by estimating the ages (Y ) of twenty teeth of known age (X). The teeth are randomly allocated to the two methods, ten to each, as follows.

Method A
X = true age
49
13
38
55
44
56
7
66
18
39
 
Y = estimated age
50
14
38
57
44
55
7
63
20
38
Method B
X = true age
51
59
32
37
12
38
4
28
58
24
 
Y = estimated age
51
59
29
34
10
35
5
25
57
22
A confirmatory analysis using a model with terms True Age (i.e. X), Method and True Age×Method is required.

(a)    Give the model equation for the required confirmatory analysis.

(b)   SAS output from a fitted model is given on pages 18 to 20. Present a report on this analysis that includes any necessary assumptions, comments on their validity, hypotheses, statistical conclusions at a 5% significance level, and interpretation plus discussion.


 

 

Linear Models 
 

The GLM Procedure

 

Class Level Information
Class
Levels
 Values
Method            2 A B
 

Number of Observations Read
 20
Number of Observations Used
 20
 

 

Dependent Variable: Y  
 
 

Source
DF
 Sum of Squares
Mean Square
 F Value
Pr F
Model
3
     6543.664660
 2181.221553
 946.16
<.0001
Error
16
          36.885340
       2.305334
             
 
Corrected Total
19
     6580.550000
             
             
 
 

R-Square
Coeff Var
 Root MSE
 Y Mean
0.994395 4.258997 1.518333 35.65000
 

Source
DF
       Type I SS
 Mean Square
 F Value
Pr F
X
1
 6525.535206
 6525.535206
 2830.62
<.0001
Method
1
     15.413619
     15.413619
       6.69
0.0199
X*Method
1
       2.715836
       2.715836
       1.18
0.2938
 

Source
DF
     Type III SS
 Mean Square
 F Value
Pr F
X
1
 6350.006837
 6350.006837
 2754.48
<.0001
Method
1
     10.463729
     10.463729
       4.54
0.0490
X*Method
1
       2.715836
       2.715836
       1.18
0.2938
 

 

 

 

 

Data and fitted lines: Method A line (dashed) is above Method B line (solid)



True age

More products