STAT292- Assignment 4 Linear Regression Solved

Starting from:

$25

There are four questions, worth a total of 100 marks. Question 1 starts on page 2.

Assignment Guidelines (once more)
You are encouraged to discuss assignments with other students, but your submitted work must be your own.

The following Assignment Guidelines are helpful for all the assignments in Parts 2 and 3 of the course.

When you carry out a statistical test of hypothesis, you should state the following, when relevant:

•   Model equation.

•   Assumptions about the data, and comments about whether diagnostic graphs support those assumptions.

•   Null and alternative hypotheses.

•   ANOVA Table (if relevant), p-values.

•   Statistical conclusions. For example, “We reject H0 and conclude HA, that µ1 and µ2 differ at the 5% significance level”.

•   Interpretation of the statistical conclusions back to the original problem, using the original meaning of the response variable and any factors or covariates. For example, if comparing heights of two groups, “Female and male adults have different mean heights, with males being taller on average”.

Assignment Guidelines

1. Comprehension Test
Children in a school class are given a test of comprehension of English, marked out of 100. The children are from three different ethnic groups, which is thought to be an important factor. The question of interest is whether there are sex differences after allowing for ethnicity. The data follow:

Females
Males
Ethnic group E1 E2

E3
67 66 75 76 71 70 72
63 72 62 61 69 64 71 68 56
69 57 55 63 65 55
59 47 49
30 47
39 33
(a)    A two-way ANOVA was run on the data, with SAS output given on pages 3 to 6. Present the results from the ANOVA following the usual Assignment Guidelines, as given on page 1.

(b)   If a one-way ANOVA is done with factor Sex, the resulting ANOVA table is:

Source
DF
Sum of Squares
Mean Square
F value
p-value
Sex
1
144.166
144.166
0.99
0.3292
Error
27
3942.662
146.025

Total
28
4086.828

Briefly discuss the outcomes of the separate tests for Sex presented in parts (a) and (b). Are the conclusions different? Give reasons to explain your answer.

                                        SAS Output for Comprehension Test

Linear Models

The GLM Procedure

Class Level Information
Class
Levels
Values
Ethnicity
3
E1 E2 E3
Sex
2
F M

Number of Observations Read
29
Number of Observations Used
29

Dependent Variable: Comprehension

Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
5
    3365.438697
673.087739
21.46
<.0001
Error
23
      721.388889
31.364734

Corrected Total
28
    4086.827586

R-Square
Coeff Var
Root MSE
Comprehension Mean
0.823484
9.275400
5.600423
60.37931

Source
DF
      Type I SS
Mean Square
F Value
Pr F
Ethnicity
2
3060.640086
1530.320043
48.79
<.0001
Sex
1
275.113176
275.113176
     8.77
0.0070
Ethnicity*Sex
2
    29.685435
    14.842718
     0.47
0.6289



                                        SAS Output for Comprehension Test

Linear Models

The GLM Procedure

                                                                                     1

                                        SAS Output for Comprehension Test


                                        SAS Output for Comprehension Test

Note: E1 is the top line, E2 the middle line and E3 the lowest. (The lines are different colours, but that doesn’t show up if viewed or printed in black and white.)

2. Invertebrates in Mussel Clumps
The following data are from Peake and Quinn (1993), Temporal variation in speciesarea curves for invertebrates in clumps of an intertidal mussel, Ecography 16, 269277. The two variables used in this question are:

x = log10(Area) of each of 25 mussel clumps (in dm2), and

Y = number of different species of macroinvertebrates in each clump.

Note: Using log(Area) gives a straighter regression line than Area, which is why it is used. This is a transformation of x, not Y ; it has been done to improve linearity, not to stabilise variances.

The data follow. Decide if there is a useful linear relationship between x and Y , i.e. if x is a useful linear predictor of Y .

Clump
logArea

Species
1
2.71
3
2
2.67
7
3
2.66
6
4
2.97
8
5
3.13
10
6
3.25
9
7
3.23
10
8
3.25
11
9
3.49
16
10
3.60
9
11
3.65
13
12
3.65
14
13
3.70
12
14
3.65
14
15
3.74
20
16
3.87
22
17
3.85
15
18
3.96
20
19
4.01
22
20
3.97
21
21
4.14
15
22
4.31
24
23
4.39
25
24
4.43
25
25
4.42
24

(a)    A scatterplot of the data is given on page 8. Give comments on whether youthink the plot shows (i) linearity, (ii) constant variance.

(b)   Output from a simple linear regression using logArea to predict the numberof species is given on pages 9 and 10. Present a report on this analysis that includes (as usual) the model equation, hypotheses, assumptions, comments on whether the analysis is valid, plus statistical conclusions and interpretation.

                                         SAS Output for Mussel Clumps

Scatter Plot

                                         SAS Output for Mussel Clumps

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: Species

Number of Observations Read
25
Number of Observations Used
25

Analysis of Variance
Source
DF
Sum of

Squares
        Mean

      Square

F Value
Pr F
Model
1
868.50179
868.50179
117.85
<.0001
Error
23
169.49821
    7.36949

Corrected Total
24
1038.00000


Root MSE
2.71468
R-Square
0.8367
Dependent Mean
15.00000
Adj R-Sq
0.8296
Coeff Var
18.09787

Parameter Estimates
Variable
DF
Parameter

Estimate
Standard

      Error

t Value
Pr |t|
Intercept
1
-25.64136
3.78287
-6.78
<.0001
logArea
1
11.20214
1.03189
10.86
<.0001

                                         SAS Output for Mussel Clumps

Linear Regression Results

3. Coarse Woody Debris in Lakes
Christensen et al. (1996, Ecological Applications 6(4), 1143-1149) studied the relationships between coarse woody debris (CWD), shoreline vegetation and lake development in a sample of 16 lakes in North America. Coarse woody debris is useful in providing a habitat for various fish species. It is known to be related to the riparian (river-bank, lake-edge) tree density, irrespective of whether or not humans are present. The objective is to find out whether, after allowing for riparian tree density, human habitation is having an effect on the CWD.

The variables below were taken around the shoreline and near-shore water:

L10CABIN = log10 of 1 + density of cabins (number km−1),

RIP.DENS = density of riparian trees (trees km−1), and CWD.BASA = basal area of coarse woody debris (m2 km−1).

LAKE
AREA
RIP.DENS
CWD.BASA
L10CABIN
Bay
69
1270
121
0
Bergner
9
1210
41
0
Crampton
24
1800
183
0
Long
8
1875
130
0
Roach
20
1300
127
0
Tenderfoot
175
2150
134
0.20412
Palmer
254
1330
65
0.462398
Street
22
964
52
0.6627578
Laura
240
961
12
0.7075702
Annabelle
85
1400
46
0.763428
Joyce
12
1280
54
0.845098
Lake hills
25
976
97
0.8864907
Towanda
58
771
1
1.10721
Black oak
234
833
4
1.1238516
Johnson
31
883
1
1.2552725
Arrowhead
40
956
4
1.40824
(a)    Let Y = CWD.BASA, X1 = RIP.DENS and X2 = L10CABIN. Plots of Y vs. X1, Y vs. X2 and X1 vs. X2 are given on page 12. Comment on any relationships you see.

(b)   SAS output for the following models is presented on pages 13 to 16. Diagnosticgraphs are shown for the last model.

i. Regression of Y on the predictor X1 ii. Regression of Y on the predictor X2

iii. Regression of Y on the two predictors X1 and X2

For each analysis above, present the model equation, hypotheses and conclusions. For the third analysis, comment on whether or not the model assumptions are satisfied.

(c)    Which of the hypothesis tests from the three presented models gives the answerto the question of interest in this situation? Explain the answer.

                                      L10CABIN                                                                                             RIP.DENS

L10CABIN

Scatterplots: CWD by L10CABIN, CWD by RIP.DENS, RIP.DENS by L10CABIN

                               Coarse Woody Debris        SAS Output

Linear Regression Results
The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: CWD.BASA

Number of Observations Read
16

Number of Observations Used
16

Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
1
32054
32054
24.30
0.0002
Error
14
18466
1318.96866

Corrected Total
15
50520

Root MSE
36.31761
R-Square
0.6345

Dependent Mean
67.00000
Adj R-Sq
0.6084

Coeff Var
54.20539

Parameter Estimates

Variable
DF
Parameter Estimate
Standard Error
t Value
Pr |t|

Intercept
1
-77.09908
30.60801
-2.52
0.0246

RIP.DENS
1
0.11552
0.02343
4.93
0.0002

Linear Regression Results
The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: CWD.BASA

Number of Observations Read
16

Number of Observations Used
16

Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
1
32840
32840
26.00
0.0002
Error
14
17680
1262.86950

Corrected Total
15
50520

Root MSE
35.53688
R-Square
0.6500

Dependent Mean
67.00000
Adj R-Sq
0.6250

Coeff Var
53.04011

Parameter Estimates

Variable
DF
Parameter Estimate
Standard Error
t Value
Pr |t|

Intercept
1
121.96875
13.96871
8.73
<.0001

L10CABIN
1
-93.30142
18.29646
-5.10
0.0002

Linear Regression Results
The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: CWD.BASA

Number of Observations Read
16

Number of Observations Used
16

Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
2
38041
19020
19.81
0.0001
Error
13
12479
959.93185

Corrected Total
15
50520

Root MSE
30.98277
R-Square
0.7530

Dependent Mean
67.00000
Adj R-Sq
0.7150

Coeff Var
46.24294

Parameter Estimates

Variable
DF
Parameter Estimate
Standard Error
t Value
Pr |t|

Intercept
1
18.16485
46.22822
0.39
0.7007

RIP.DENS
1
0.06572
0.02823
2.33
0.0367

L10CABIN
1
-56.26481
22.53059
-2.50
0.0267

4. Age of Teeth
In forensic work, scientists estimate the age of a skeleton by counting teeth cementum annulation (i.e. growth rings). Two teeth preparation methods, A and B, are compared by estimating the ages (Y ) of twenty teeth of known age (X). The teeth are randomly allocated to the two methods, ten to each, as follows.

Method A
X = true age
49
13
38
55
44
56
7
66
18
39

Y = estimated age
50
14
38
57
44
55
7
63
20
38
Method B
X = true age
51
59
32
37
12
38
4
28
58
24

Y = estimated age
51
59
29
34
10
35
5
25
57
22
A confirmatory analysis using a model with terms True Age (i.e. X), Method and True Age×Method is required.

(a)    Give the model equation for the required confirmatory analysis.

(b)   SAS output from a fitted model is given on pages 18 to 20. Present a report on this analysis that includes any necessary assumptions, comments on their validity, hypotheses, statistical conclusions at a 5% significance level, and interpretation plus discussion.

Linear Models

The GLM Procedure

Class Level Information
Class
Levels
Values
Method            2 A B

Number of Observations Read
20
Number of Observations Used
20

Dependent Variable: Y

Source
DF
Sum of Squares
Mean Square
F Value
Pr F
Model
3
     6543.664660
2181.221553
946.16
<.0001
Error
16
          36.885340
       2.305334


Corrected Total
19
     6580.550000



R-Square
Coeff Var
Root MSE
Y Mean
0.994395 4.258997 1.518333 35.65000

Source
DF
       Type I SS
Mean Square
F Value
Pr F
X
1
6525.535206
6525.535206
2830.62
<.0001
Method
1
     15.413619
     15.413619
       6.69
0.0199
X*Method
1
       2.715836
       2.715836
       1.18
0.2938

Source
DF
     Type III SS
Mean Square
F Value
Pr F
X
1
6350.006837
6350.006837
2754.48
<.0001
Method
1
     10.463729
     10.463729
       4.54
0.0490
X*Method
1
       2.715836
       2.715836
       1.18
0.2938

Data and fitted lines: Method A line (dashed) is above Method B line (solid)

True age

More products

CS204 Take Home Exam 5 – Waiting for the Worms Solution

$24.99

Add to cart

CS204 Assignment 4 – The Narrow Way Solution

$29.99

Add to cart

CS204 Take Home Exam 3 – Common Tokens Solution

$29.99

Add to cart