Starting from:

$35

Applied Statistics-Online Assignment Solved

Part 1
1. First, select a dataset on a topic that you are interested in. Be sure that the dataset has at least two quantitative variables.

Here are some sources for a dataset to consider:

•      Kaggle

•      Tidy Tuesday

•      Data is Plural

•      ICPSR

•      UCI Machine Learning Repository

•      FiveThirtyEight

•      Google’s Dataset Search

If your dataset is located online, provide a link to the dataset. If the dataset is from another source, provide a brief description of the dataset and indicate how you have access to the dataset.

Part 2
We’ll rely on R to help us create “fake” data and then practice understanding a linear model applied to this data.

We’ll provide some initial information to R to set up our data, including our sample size, our x values, and other population characteristics:

sample_size = 21

x_vals = seq(from = 0, to = 10, length.out = sample_size) sigma = 3

1.    Replace the following code with your birthdate in mmddyyyy form. Currently, the birthday of June 13, 1876, which is William Gosset’s (pen name Student’s) birthday is below.

set.seed(02162001)

2.    Next, we’ll set some important characteristics for our data. We start by generating the randomness of our data.

epsilon = rnorm(n = sample_size, mean = 0, sd = sigma)

Now, generate the values of y based on the following relationship:

Y = 7 − 1.4x +

where  ∼ N(0,σ2 = 9) (independently).

The values of  have been generated in the above code chunk. Save the values of y as y_vals in R

# Use this code chunk for your answer. y_vals = 7 - (1.4 * x_vals) + epsilon

3.    Uncomment the following line of code to create a data frame that contains both x and y.

sim_data = data.frame(x_vals, y_vals)

Report the dimensions of this data frame. Print the first few rows of this data frame. Calculate the correlation between x & y.

# Use this code chunk for your answer.

dim(sim_data)

## [1] 21 2

head(sim_data)

## x_vals y_vals ## 1 0.0 10.792018 ## 2 0.5 8.284465 ## 3 1.0 7.798642 ## 4 1.5 11.060227 ## 5 2.0 3.846448 ## 6 2.5 -2.172807

cor(sim_data$x_vals, sim_data$y_vals)

## [1] -0.8234135

calculated using these values of which 2 is included and 12 is not.

More products