Starting from:

$24.99

PSTAT131 Homework 1 Solution


Machine Learning Main Ideas
Please answer the following questions. Be sure that your solutions are clearly marked and that your document is neatly formatted.
Question 1:
Define supervised and unsupervised learning. What are the difference(s) between them?
Question 2:
Explain the difference between a regression model and a classification model, specifically in the context of machine learning.
Question 3:
Name two commonly used metrics for regression ML problems. Name two commonly used metrics for classification ML problems.
Question 4:
As discussed, statistical models can be used for different purposes. These purposes can generally be classified into the following three categories. Provide a brief description of each.
Descriptive models:
Inferential models: Predictive models:
Question 5:
Predictive models are frequently used in machine learning, and they can usually be described as either mechanistic or empiricallydriven. Answer the following questions.
Define mechanistic. Define empirically-driven. How do these model types differ? How are they similar?
In general, is a mechanistic or empirically-driven model easier to understand? Explain your choice.
Describe how the bias-variance tradeoff is related to the use of mechanistic or empirically-driven models.
Question 6:
A political candidate’s campaign has collected some detailed voter history data from their constituents. The campaign is interested in two questions:
Given a voter’s profile/data, how likely is it that they will vote in favor of the candidate?
How would a voter’s likelihood of support for the candidate change if they had personal contact with the candidate?
Classify each question as either predictive or inferential. Explain your reasoning for each.
Exploratory Data Analysis
This section will ask you to complete several exercises. For this homework assignment, we’ll be working with the mpg data set that is loaded when you load the tidyverse. Make sure you load the tidyverse and any other packages you need.
Exploratory data analysis (or EDA) is not based on a specific set of rules or formulas. It is more of a state of curiosity about data. It’s an iterative process of:
generating questions about data
visualize and transform your data as necessary to get answers
use what you learned to generate more questions
A couple questions are always useful when you start out. These are “what variation occurs within the variables,” and “what covariation occurs between the variables.”
You should use the tidyverse and ggplot2 for these exercises.
Exercise 1:
We are interested in highway miles per gallon, or the hwy variable. Create a histogram of this variable. Describe what you see/learn.
Exercise 2:
Create a scatterplot. Put hwy on the x-axis and cty on the y-axis. Describe what you notice. Is there a relationship between hwy and cty ? What does this mean?
Exercise 3:
Make a bar plot of manufacturer . Flip it so that the manufacturers are on the y-axis. Order the bars by height. Which manufacturer produced the most cars? Which produced the least?
Exercise 4:
Make a box plot of hwy , grouped by cyl . Do you see a pattern? If so, what?
Exercise 5:
Use the corrplot package to make a lower triangle correlation matrix of the mpg dataset. (Hint: You can find information on the package here.)
Which variables are positively or negatively correlated with which others? Do these relationships make sense to you? Are there any that surprise you?
231 Students Only:
Exercise 6:

Exercise 7
Recreate the following graphic.

Exercise 8
Recreate the following graphic.

More products