Starting from:

$35

CSDE 502 Assignment 9 Solved

Code 

Show All Code
Hide All Code
CSDE 502  Assignment 9
dcoomes
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE)
 
library(captioner)
library(tidyverse)
library(magrittr)
library(kableExtra)
 
figure_nums <- captioner(prefix = "Figure")
table_nums <- captioner(prefix = "Table")
Explanation: This assignment is intended to give you more practice delving into the Add Health data set and in manipulating additional variables.

Instructions:

Make sure your Rmd file has no local file system dependencies (i.e., anyone should be able to recreate the output HTML using only the Rmd source file).
Make a copy of this Rmd file and add answers below each question. The code that generated the answers should be included, as well as the complete source code for the document.
Change the YAML header above to identify yourself and include contact information.
For any tables or figures, include captions and cross-references and any other document automation methods as necessary.
Make sure your output HTML file looks appealing to the reader.
Upload the final Rmd to your github repository.
Download assn_id.txt and include the URL to your Rmd file on github.com.
Create a zip file from your copy of assn_id.txt and upload the zip file to the Canvas site for Assignment 9. The zip file should contain only the text file. Do not include any additional files in the zip file--everything should be able to run from the file you uploaded to github.com. Please use zip format and not 7z or any other compression/archive format.

Using the full household roster (you'll need to go back the full raw data source, 21600-0001-Data.dta), create the following variables for each respondent. Document any decisions that you make regarding missing values, definitions, etc. in your narrative as well as in the R code. Include a frequency tabulation and a histogram of each result.

Starting by pulling in the full dataset from GitHub and listing the variables.

add_helth <- haven::read_dta("https://github.com/dmccoomes/csde502_winter_2021_dcoomes/raw/main/Homework/homework_09/data/21600-0001-Data.dta")
 
metadata <- bind_cols(
    # variable name
    varname = colnames(add_helth),
    # label
    varlabel = lapply(add_helth, function(x) attributes(x)$label) %% 
        unlist(),
    # format
    varformat = lapply(add_helth, function(x) attributes(x)$format.stata) %%
        unlist(),
    # values
    varvalues = lapply(add_helth, function(x) attributes(x)$labels) %% 
        # names the variable label vector
        lapply(., function(x) names(x)) %% 
        # as character
        as.character() %% 
        # remove the c() construction
        str_remove_all("^c\\(|\\)$")
)
 
DT::datatable(metadata)
1.1 
Total number in household

I will use the question "How many people live in household?" to construct the total number in the household. I will not include any observations that reported they don't live in a regular household. As we can see from Table 1 and Figure 1 more households have 4 members as compared to other numbers, and the distribution of those that answered is right-skewed.

add_helth %<%
  mutate(num_house=S27) %%
  mutate(num_house=ifelse(num_house==7|num_house==99, NA, num_house))
add_helth %%
  group_by(num_house) %%
  summarize(n=n()) %%
  mutate(`%`=n/sum(n)*100) %%
  mutate(`%`=`%` %% round(1)) %%
  mutate("cum %"= round(cumsum(n/sum(n)*100), 1)) %%
  kable(caption="Total number of individuals living in household") %%
  kable_styling(full_width=FALSE, position="left", 
                bootstrap_options = c("striped", "hover")) 
Table 1.1: Total number of individuals living in household 
num_house 


cum % 

24 
0.4 
0.4 

239 
3.7 
4.0 

853 
13.1 
17.2 

1564 
24.0 
41.2 

1095 
16.8 
58.0 

822 
12.6 
70.7 
NA 
1907 
29.3 
100.0 
bins <- length(unique(add_helth$num_house))-1
 
ggplot(data=add_helth, mapping=aes(x=num_house)) +
  geom_histogram(bins=bins, color="red", fill="white") + 
  theme_bw() + 
  labs(x="Number of people per household", y="Count")
 

Figure 1.1: Histogram of the number of people per household

1.2 
Number of sisters

1.3 
Number of brothers

1.4 
Total number of siblings


What proportion of students live with two biological parents? Include the analysis in your R code.


Calculate the number of household members that are NOT biological mother, biological father, full brother or full sister. Create a contingency table and histogram for this variable.

3.1 Source code
cat(readLines(con = "dcoomes_hw_09.Rmd"), sep = '\n')
---
title: "CSDE 502 Winter 2021, Assignment 8"
author: "[dcoomes](mailto:dcoomes@uw.edu)"
output: 
    bookdown::html_document2:
        number_sections: true
        self_contained: true
        code_folding: hide
        toc: true
        toc_float:
            collapsed: true
            smooth_scroll: false
    pdf_document:
        number_sections: true
        toc: true
        fig_cap: yes
        keep_tex: yes
urlcolor: blue 
---
 
```{r, warning=FALSE, message=FALSE}
 
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE)
 
library(captioner)
library(tidyverse)
library(magrittr)
library(kableExtra)
 
figure_nums <- captioner(prefix = "Figure")
table_nums <- captioner(prefix = "Table")
```
 
___Explanation___:
This assignment is intended to give you more practice delving into the Add Health data set and in manipulating additional variables. 
 
___Instructions___: 
 
1. Make sure your Rmd file has no local file system dependencies (i.e., anyone should be able to recreate the output HTML using only the Rmd source file).
1. Make a copy of this Rmd file and add answers below each question. The code that generated the answers should be included, as well as the complete source code for the document.
1. Change the YAML header above to identify yourself and include contact information.
1. For any tables or figures, include captions and cross-references and any other document automation methods as necessary.
1. Make sure your output HTML file looks appealing to the reader.
1. Upload the final Rmd to your github repository.
1. Download [`assn_id.txt`](http://staff.washington.edu/phurvitz/csde502_winter_2021/assignments/assn_id.txt) and include the URL to your Rmd file on github.com.
1. Create a zip file from your copy of `assn_id.txt` and upload the zip file to the Canvas site for Assignment 9. ___The zip file should contain only the text file. Do not include any additional files in the zip file--everything should be able to run from the file you uploaded to github.com. Please use zip format and not 7z or any other compression/archive format.___
 
 
#
__Using the full household roster (you'll need to go back the full raw data source, [21600-0001-Data.dta](http://staff.washington.edu/phurvitz/csde502_winter_2021/data/21600-0001-Data.dta.zip)), create the following variables for each respondent. Document any decisions that you make regarding missing values, definitions, etc. in your narrative as well as in the R code.  Include a frequency tabulation and a histogram of each result.__
 
Starting by pulling in the full dataset from GitHub and listing the variables.
 
```{r, cache=TRUE, results='hide'}
 
add_helth <- haven::read_dta("https://github.com/dmccoomes/csde502_winter_2021_dcoomes/raw/main/Homework/homework_09/data/21600-0001-Data.dta")
 
metadata <- bind_cols(
    # variable name
    varname = colnames(add_helth),
    # label
    varlabel = lapply(add_helth, function(x) attributes(x)$label) %% 
        unlist(),
    # format
    varformat = lapply(add_helth, function(x) attributes(x)$format.stata) %%
        unlist(),
    # values
    varvalues = lapply(add_helth, function(x) attributes(x)$labels) %% 
        # names the variable label vector
        lapply(., function(x) names(x)) %% 
        # as character
        as.character() %% 
        # remove the c() construction
        str_remove_all("^c\\(|\\)$")
)
 
DT::datatable(metadata)
 
```
 
##
__Total number in household__
 
I will use the question "How many people live in household?" to construct the total number in the household. I will not include any observations that reported they don't live in a regular household. As we can see from **`r table_nums(name="numtable", display="cite")`** and **`r figure_nums(name="numhist", display="cite")`** more households have 4 members as compared to other numbers, and the distribution of those that answered is right-skewed.
 
```{r}
 
add_helth %<%
  mutate(num_house=S27) %%
  mutate(num_house=ifelse(num_house==7|num_house==99, NA, num_house))
 
```
 
```{r numtable}
 
add_helth %%
  group_by(num_house) %%
  summarize(n=n()) %%
  mutate(`%`=n/sum(n)*100) %%
  mutate(`%`=`%` %% round(1)) %%
  mutate("cum %"= round(cumsum(n/sum(n)*100), 1)) %%
  kable(caption="Total number of individuals living in household") %%
  kable_styling(full_width=FALSE, position="left", 
                bootstrap_options = c("striped", "hover")) 
 
```
 
```{r numhist, fig.cap="Histogram of the number of people per household"}
 
bins <- length(unique(add_helth$num_house))-1
 
ggplot(data=add_helth, mapping=aes(x=num_house)) +
  geom_histogram(bins=bins, color="red", fill="white") + 
  theme_bw() + 
  labs(x="Number of people per household", y="Count")
 
```
 
##
__Number of sisters__
 
 
##
__Number of brothers__
 
 
##
__Total number of siblings__
 
 
#
__What proportion of students live with two biological parents? Include the analysis in your R code.__
 
 
#
__Calculate the number of household members that are NOT biological mother, biological father, full brother or full sister. Create a contingency table and histogram for this variable.__
 
## Source code
```{r comment=''}
cat(readLines(con = "dcoomes_hw_09.Rmd"), sep = '\n')
```

More products