$35
Code
Show All Code
Hide All Code
CSDE 502 Assignment 9
dcoomes
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE)
library(captioner)
library(tidyverse)
library(magrittr)
library(kableExtra)
figure_nums <- captioner(prefix = "Figure")
table_nums <- captioner(prefix = "Table")
Explanation: This assignment is intended to give you more practice delving into the Add Health data set and in manipulating additional variables.
Instructions:
Make sure your Rmd file has no local file system dependencies (i.e., anyone should be able to recreate the output HTML using only the Rmd source file).
Make a copy of this Rmd file and add answers below each question. The code that generated the answers should be included, as well as the complete source code for the document.
Change the YAML header above to identify yourself and include contact information.
For any tables or figures, include captions and cross-references and any other document automation methods as necessary.
Make sure your output HTML file looks appealing to the reader.
Upload the final Rmd to your github repository.
Download assn_id.txt and include the URL to your Rmd file on github.com.
Create a zip file from your copy of assn_id.txt and upload the zip file to the Canvas site for Assignment 9. The zip file should contain only the text file. Do not include any additional files in the zip file--everything should be able to run from the file you uploaded to github.com. Please use zip format and not 7z or any other compression/archive format.
1
Using the full household roster (you'll need to go back the full raw data source, 21600-0001-Data.dta), create the following variables for each respondent. Document any decisions that you make regarding missing values, definitions, etc. in your narrative as well as in the R code. Include a frequency tabulation and a histogram of each result.
Starting by pulling in the full dataset from GitHub and listing the variables.
add_helth <- haven::read_dta("https://github.com/dmccoomes/csde502_winter_2021_dcoomes/raw/main/Homework/homework_09/data/21600-0001-Data.dta")
metadata <- bind_cols(
# variable name
varname = colnames(add_helth),
# label
varlabel = lapply(add_helth, function(x) attributes(x)$label) %%
unlist(),
# format
varformat = lapply(add_helth, function(x) attributes(x)$format.stata) %%
unlist(),
# values
varvalues = lapply(add_helth, function(x) attributes(x)$labels) %%
# names the variable label vector
lapply(., function(x) names(x)) %%
# as character
as.character() %%
# remove the c() construction
str_remove_all("^c\\(|\\)$")
)
DT::datatable(metadata)
1.1
Total number in household
I will use the question "How many people live in household?" to construct the total number in the household. I will not include any observations that reported they don't live in a regular household. As we can see from Table 1 and Figure 1 more households have 4 members as compared to other numbers, and the distribution of those that answered is right-skewed.
add_helth %<%
mutate(num_house=S27) %%
mutate(num_house=ifelse(num_house==7|num_house==99, NA, num_house))
add_helth %%
group_by(num_house) %%
summarize(n=n()) %%
mutate(`%`=n/sum(n)*100) %%
mutate(`%`=`%` %% round(1)) %%
mutate("cum %"= round(cumsum(n/sum(n)*100), 1)) %%
kable(caption="Total number of individuals living in household") %%
kable_styling(full_width=FALSE, position="left",
bootstrap_options = c("striped", "hover"))
Table 1.1: Total number of individuals living in household
num_house
n
%
cum %
1
24
0.4
0.4
2
239
3.7
4.0
3
853
13.1
17.2
4
1564
24.0
41.2
5
1095
16.8
58.0
6
822
12.6
70.7
NA
1907
29.3
100.0
bins <- length(unique(add_helth$num_house))-1
ggplot(data=add_helth, mapping=aes(x=num_house)) +
geom_histogram(bins=bins, color="red", fill="white") +
theme_bw() +
labs(x="Number of people per household", y="Count")
Figure 1.1: Histogram of the number of people per household
1.2
Number of sisters
1.3
Number of brothers
1.4
Total number of siblings
2
What proportion of students live with two biological parents? Include the analysis in your R code.
3
Calculate the number of household members that are NOT biological mother, biological father, full brother or full sister. Create a contingency table and histogram for this variable.
3.1 Source code
cat(readLines(con = "dcoomes_hw_09.Rmd"), sep = '\n')
---
title: "CSDE 502 Winter 2021, Assignment 8"
author: "[dcoomes](mailto:dcoomes@uw.edu)"
output:
bookdown::html_document2:
number_sections: true
self_contained: true
code_folding: hide
toc: true
toc_float:
collapsed: true
smooth_scroll: false
pdf_document:
number_sections: true
toc: true
fig_cap: yes
keep_tex: yes
urlcolor: blue
---
```{r, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE)
library(captioner)
library(tidyverse)
library(magrittr)
library(kableExtra)
figure_nums <- captioner(prefix = "Figure")
table_nums <- captioner(prefix = "Table")
```
___Explanation___:
This assignment is intended to give you more practice delving into the Add Health data set and in manipulating additional variables.
___Instructions___:
1. Make sure your Rmd file has no local file system dependencies (i.e., anyone should be able to recreate the output HTML using only the Rmd source file).
1. Make a copy of this Rmd file and add answers below each question. The code that generated the answers should be included, as well as the complete source code for the document.
1. Change the YAML header above to identify yourself and include contact information.
1. For any tables or figures, include captions and cross-references and any other document automation methods as necessary.
1. Make sure your output HTML file looks appealing to the reader.
1. Upload the final Rmd to your github repository.
1. Download [`assn_id.txt`](http://staff.washington.edu/phurvitz/csde502_winter_2021/assignments/assn_id.txt) and include the URL to your Rmd file on github.com.
1. Create a zip file from your copy of `assn_id.txt` and upload the zip file to the Canvas site for Assignment 9. ___The zip file should contain only the text file. Do not include any additional files in the zip file--everything should be able to run from the file you uploaded to github.com. Please use zip format and not 7z or any other compression/archive format.___
#
__Using the full household roster (you'll need to go back the full raw data source, [21600-0001-Data.dta](http://staff.washington.edu/phurvitz/csde502_winter_2021/data/21600-0001-Data.dta.zip)), create the following variables for each respondent. Document any decisions that you make regarding missing values, definitions, etc. in your narrative as well as in the R code. Include a frequency tabulation and a histogram of each result.__
Starting by pulling in the full dataset from GitHub and listing the variables.
```{r, cache=TRUE, results='hide'}
add_helth <- haven::read_dta("https://github.com/dmccoomes/csde502_winter_2021_dcoomes/raw/main/Homework/homework_09/data/21600-0001-Data.dta")
metadata <- bind_cols(
# variable name
varname = colnames(add_helth),
# label
varlabel = lapply(add_helth, function(x) attributes(x)$label) %%
unlist(),
# format
varformat = lapply(add_helth, function(x) attributes(x)$format.stata) %%
unlist(),
# values
varvalues = lapply(add_helth, function(x) attributes(x)$labels) %%
# names the variable label vector
lapply(., function(x) names(x)) %%
# as character
as.character() %%
# remove the c() construction
str_remove_all("^c\\(|\\)$")
)
DT::datatable(metadata)
```
##
__Total number in household__
I will use the question "How many people live in household?" to construct the total number in the household. I will not include any observations that reported they don't live in a regular household. As we can see from **`r table_nums(name="numtable", display="cite")`** and **`r figure_nums(name="numhist", display="cite")`** more households have 4 members as compared to other numbers, and the distribution of those that answered is right-skewed.
```{r}
add_helth %<%
mutate(num_house=S27) %%
mutate(num_house=ifelse(num_house==7|num_house==99, NA, num_house))
```
```{r numtable}
add_helth %%
group_by(num_house) %%
summarize(n=n()) %%
mutate(`%`=n/sum(n)*100) %%
mutate(`%`=`%` %% round(1)) %%
mutate("cum %"= round(cumsum(n/sum(n)*100), 1)) %%
kable(caption="Total number of individuals living in household") %%
kable_styling(full_width=FALSE, position="left",
bootstrap_options = c("striped", "hover"))
```
```{r numhist, fig.cap="Histogram of the number of people per household"}
bins <- length(unique(add_helth$num_house))-1
ggplot(data=add_helth, mapping=aes(x=num_house)) +
geom_histogram(bins=bins, color="red", fill="white") +
theme_bw() +
labs(x="Number of people per household", y="Count")
```
##
__Number of sisters__
##
__Number of brothers__
##
__Total number of siblings__
#
__What proportion of students live with two biological parents? Include the analysis in your R code.__
#
__Calculate the number of household members that are NOT biological mother, biological father, full brother or full sister. Create a contingency table and histogram for this variable.__
## Source code
```{r comment=''}
cat(readLines(con = "dcoomes_hw_09.Rmd"), sep = '\n')
```