$20
1. Consider the prostate cancer dataset available on eLearning as prostate cancer.csv. It consists of data on 97 men with advanced prostate cancer. A description of the variables is given in Figure 1. We would like to understand how PSA level is related to the other predictors in the dataset. Note that vesinv is a qualitative variable. You can treat gleason as a quantitative variable.
Build a “reasonably good” linear model for these data by taking PSA level as the response variable. Carefully justify all the choices you make in building the model. Be sure to verify the model assumptions. In case a transformation of response is necessary, try the natural log transformation. Use the final model to predict the PSA level for a patient whose quantitative predictors are at the sample means of the variables and qualitative predictors are at the most frequent category.
1
header
name
description
subject
ID
1 to 97
psa
PSA level
Serum prostate-specific antigen level (mg/ml)
cancervol
Cancer Volume
Estimate of prostate cancer volume (cc)
weight
Weight
prostate weight (gm)
age
Age
Age of patient (years)
benpros
Benign prostatic hyperplasia
Amount of benign prostatic hyperplasia (cm2)
vesinv
Seminal vesicle invasion
Presence (1) or absence (0) of seminal vesicle invasion
capspen
Capsular penetration
Degree of capsular penetration (cm)
gleason
Gleason score
Pathologically determined grade of disease (6, 7 or 8)
Figure 1: List of variables in the prostate cancer data
.
2