Starting from:

$30

GR5243-Homework 2 Solved

About The Data
We will be working with a simulated data set related to electronic health records and long-run outcomes for cardiology patients.

File: ../Data/Homework 2 Data.csv

Delimiter: Each column of the data set is separated with a comma , delimiter.

Header The first row of the data set includes the column names, and each subsequent row includes one observation of values. Here is a selection of 1000 lines from the data set:

The data is written in long format (e.g. panel data). Each patient’s records are collected over time in one or more rows. Each row corresponds to a period of time. During this time, the patient’s status is recorded in terms of medications, hospitalizations, and complications. Each patient is followed until either death or the end of the follow-up period.

Here is a brief description of each variable:

·         id: This is a unique identifier for each patient. Because of strict privacy regulations, this identifier is anonymous. All records with the same value of id correspond to the same patient. This patient’s medical history is recorded in all of the rows with this id value. Some patients may have only a single row, while others may have many rows of updates.

·         begin: This is the beginning of the observation interval. This is defined as the number of days since the patient entered the study (see the definition of age above). The patient’s age at the beginning of the interval is the age variable (in years) plus the begin variable (in days).

·         end: This is the end of the observation interval. This is defined as the number of days since the patient entered the study (see the definition of age above). The observation interval is half open. This means that the begin date is included, while the end date is excluded. For patients with more than one row of records, the beginning of the next row should correspond to the end of the previous row. Any mismatches between these values constitute gaps in coverage, when we lack records on a patient. (For instance, if a patient switches insurance companies and then switches back, then we might lose a year’s worth of records.) The length of an interval in one row is therefore end - begin days. The patient’s age at the end of the interval is the age variable (in years) plus the end variable (in days).

·         age: This is the patient’s age in (rounded) years at the time of entry into the study – at the first diagnosis of coronary heart disease. For patients with multiple records in different rows, the age should be the same in every entry. For the purpose of this study, all of the patients should be at least 18 years old.

·         diabetes: This is an indicator of whether the patient had a diagnosed case of diabetes mellitus.

·         hypertension: This is an indicator of whether the patient had a diagnosed case of hypertension.

·         kidney_disease This is an indicator of whether the patient had a diagnosed case of kidney disease.

ace: This is an indicator of adherence for ACE Inhibitors, a common cardiovascular drug. This information is recorded based on a self-reported log that tracks the patient’s daily usage of the medicine. Therefore, we have the following coding for the values of ace: 
1: Possession;
0: No possession.
·         beta.blocker: This is an indicator for adherence of Beta Blockers, a cardiovascular medicine. It has the same coding as that of ace.

·         statin: This is an indicator for adherence of Statins, another cardiovascular medicine. It has the same coding as that of ace and beta.blocker.

hospital: This is an indicator of whether the patient was in the hospital during the interval. Its values are coded as: 
1: Hospitalized;
0: Not Hospitalized.
heart.attack: This is an indicator of whether the patient suffered a heart attack. When this occurs, the patient is assumed to go to the hospital and stay for some period of time (e.g. 1-7 days). The heart attack is assumed to happen at the beginning of the interval, and the remainder of this time is considered a recovery period. The values are coded as: 
1: Suffered a heart attack.
0: No heart attack.
death: This is an indicator of the end of the patient’s life. Its values are coded as: 
1: End of life.
0: Patient is still alive.
Each patient is followed until either death or the end of the observation. Many patients with coronary disease were still alive at the end of follow-up.

Note: The description above tells you the intended structure of the data set. However, it’s possible that there could be problems lurking in the records. In the course of doing this assignment, you may uncover some issues. For instance, you may find an erroneous value in some of the variables. In this circumstance, it will be necessary to resolve the situation. Here are some guidelines for doing so:

·         If the issue has an obvious solution, then you may recode the data. For instance, if you see a value of TRUE for the heart.attack variable, then you may safely assume that this value should have been coded as a 1.

·         If the issue does not have an obvious solution, then you can replace the erroneous value with NA to denote a missing value.

In either circumstance, note the problem in your solution and briefly describe the work you did to clean the data.

Question 1: Reading the Data
One way to read data files is using the fread function. Read in the data and answer these questions:

·         How many rows are there?

·         How many columns?

·         How many unique patients are there?

·         What are the names of the columns? Do they match up with our description of the data set?

Question 2: Inspection and Cleaning
Briefly inspect the data. Do you see any potential problems with any of the variables? If so, perform some data cleaning according to the guidelines in the instructions. Briefly describe the work you did and justify any difficult choices you made.

Fill in your work in the subsections below.

Checking begin
Checking end
Checking age
Checking diabetes
Checking hypertension
Checking kidney_disease
Checking ace
Checking beta.blocker
Checking statin
Checking hospital
Checking heart.attack
Checking death
For all subsequent questions, please rely on the clean version of the data that you created.

Question 3: Patient-Level Summaries
For age, diabetes, hypertension, and kidney_disease, what are the average values and standard deviations at baseline? For age, this would be an average in years. For the disease states, this would be the percentage of the population who have these conditions. Display the results in a table. Please round each number to 1 decimal place. For percentages, this should appear in the format of 36.1% rather than 0.361.

Hint: Make sure to only use one entry per id, with a focus on the earliest measured row for each patient. It may help to sort the data by id and begin in increasing order using the setorderv function.

Question 4: Counting Outcomes
Part A
How many heart attacks were there in follow-up? How many deaths occurred?

Part B
How many total hospitalizations occurred across all of the patients? Keep in mind that a single hospitalization may span multiple rows of data. Incorporate this count into the previous table. Compare the value calculated here to the number of rows with hospitalizations.

Question 5: Counting Outcomes by Medication Usage
Now let’s count the number of deaths, heart attacks, and hospitalizations split by medication usage. Show how many of these outcomes occurred while the patients were taking each medicine (ACE Inhibitors, Beta Blockers, and Statins) – and while they were not taking them. Show your results in tables with each medicine’s status in a row and each outcome’s counts in a column. Only display the results when the value of the medication is measured (not NA).

Question 6: Follow-Up
Each patient may spend some time in follow-up on the medication and other periods not using it. We want to get a sense of how much these medicines are used relative to the available time. A person-year is defined as one year of observation for one patient. 10 person-years can be accumulated by following one person for 10 years, two for 5 apiece, three for 2, 7, and 1, respectively, or other combinations. With this in mind, we want to study the utilization of medicines.

How many total person-years of observation do we have in the records? What is the average number of years of observation per patient?
Reminder: Don’t forget to convert your answers into the proper unit of time. Please define a year as 365.25 days. Round your answers to 1 decimal point.

Question 7: Utilization
How many person-years did the patients spend on each medicine – ACE Inhibitors, Beta Blockers, and Statins? How much time was spent off of them? How much time was missing from observation?
Reminder: Don’t forget to convert your answers into the proper unit of time. Please define a year as 365.25 days. Round your answers to 1 decimal point.

Question 8: Crude Event Rates
Now we will compare the counts for the outcomes of hospitalization, heart attacks, and death against the relative follow-up time. Compute the crude rates (the mean number of outcomes) per 100 person years of follow-up. To do this, show the overall amount of follow-up time, the number of events for each outcome, and their ratio in units of events per 100 person years. Remember to define a year as 365.25 days. Round your results to 1 decimal place.

Question 9: Crude Event Rates By Medication Usage
How do the crude rates of hospitalization, heart attacks, and death per 100 person-years of follow-up differ depending on medication usage? Show the number of events and crude rates while taking and not taking each medicine:

·         ACE Inhibitors

·         Beta Blockers

·         Statins

Question 10: Unadjusted Odds Ratios
What is the impact of each medication? One way to calculate their impact is with the unadjusted odds ratio, which compares the rate of outcomes while taking the medicine to the rate without taking the medicine. For reference, an odds ratio less than 1 demonstrates that a factor is associated with a reduction in an outcome, a value greater than 1 shows that the factor is associated with an increase in an outcome, and a value close to 1 shows no association. For each medicine, compute the unadjusted odds ratios for hospitalization, heart attacks, and death. Round your answers to 2 decimal places.

·         ACE Inhibitors

·         Beta Blockers

·         Statins
 

More products