Starting from:

$34.99

ECO481 Assignment 2 Solution

1 Exercise 1: Multiple-choice question (please provide an explanation for your choice (12 points)
1. Which statement is more likely to be true?
a. A classifier trained on less training data is less likely to overfit.
b. When the feature space is larger, overfitting is less likely.
c. As the number of training examples goes to infinity, your model trained on thatdata will have a lower variance.
d. As the number of training examples goes to infinity, your model trained on thatdata will have a lower bias.
2. Suppose I give you the following information:

3. Which of the following statement is true for both Na¨ıve Bayes classifier and decision trees?
a. In both classifiers, a pair of features are assumed to be independent.
b. In both classifiers, a pair of features are assumed to be dependent.
c. In both classifiers, a pair of features are assumed to be independent given theclass label.
d. In both classifiers, a pair of features are assumed to be dependent given the classlabel
2 Exercise 2: Text classification using Naive Bayes (25 points)
We want to classify some texts using Naive Bayes Classifier.

The potential labels are: technical, financial and irrelevant.
In addition, we know the word frequencies:

Assume that ”0. is a very small value.
Moreover, the prior distribution is given by: 50% for technical, 40% for financial, and 10% for no interest. Last, ignore all the other words not in the previous table.
1. Apply the Naive Bayes Classifier to those texts. Provide a full explanation of all the steps and computations that lead to your results. (15 points)
2. In the next step, we would like to focus on expressions like ”network capacity”.
a. How does a Naive Bayes classifier handle the expressions? (Hints: you shoulddiscuss two cases). (5 points)
b. What Python package and module will you use to build the expressions? Bespecific and clearly explain the different steps you will take. (5 points)
3 Exercise 3: MSE (20 points)
You are given the following data points.

1. You are doing 3-fold cross-validation. Each time the model is learned from the non-leftout data points. Assume you use a trivial algorithm that predicts a constant y = c. What is the mean square error from the 3-fold cross-validation? (10 points)
2. You are doing a 3-fold cross-validation. Each time the model is learned from the non-leftout data points. What is the mean square error from the 3-fold cross-validation assuming you fit a linear regression ( ). (10 points)
4 Exercise 4: Build a Covid uncertainty index (43 points)
You work at the Federal Reserve in the US. You are in charge of analysing the impact of Covid on the stock market for the last 30 days. You also want to assess the general feeling using the news for this period.
1. Read the file NYT headline.csv on python and drop the duplicates (1 point).
2. Build a vocabulary of Covid-19 related words (3 points).
3. Combine the different headlines by day (1 point).
4. Use topic modelling to exhibit the key topics of the headlines (10 points).
NB: Find the optimal number of topics, name the topics and display the topics using wordclouds.
5. Using the vocabulary constructed, build a daily covid related index (that we will call the covid uncertainty index) by estimating the relative fraction of articles related to covid to the total number of articles per day (5 points).
6. Use the following words “uncertainty”, “uncertain”, “economic”, “economy”, “Congress”, “deficit”, “Federal Reserve”, “legislation”, “regulation”, or “White House”, “uncertainties”, “regulatory”, or “the Fed” to construct a daily economic policy uncertainty index. In the same manner as for the covid uncertainty index, build the current index by estimating the relative fraction of articles that use any of those words. We will call it a coarse economic policy uncertainty index (3 points).
7. Can you argue why this is not a very good way of assessing economic policy uncertainty (this is why it is “coarse”) (3 points)?
8. Use the variable “Adj Close” to compute the return on S&P500 (GSPC) (3 points).ˆ
9. Using a plot and simple correlations, exhibit the link between the Covid uncertainty index, the coarse economic policy index and the returns. Comment on your findings (3 points).
10. Select the articles that contains at least one word in the covid-related dictionary you constructed. For those articles, use the Vader sentiment lexicon and construct:
a) a daily sentiment index and plot. (Just consider the dates with a covid-relatedword. The dates without any covid-related word are considered as missing values). Include the three dimensions: Negative, Neutral and Positive (5 points).
b) an aggregate sentiment over all the period of the database. Include the threedimensions: Negative, Neutral and Positive (3 points).
11. Your boss asks you to write a short paragraph highlighting your key findings on this study.
What will this paragraph look like? (No more than 5 lines) (3 points).
References:
Baker, Scott R., Nicholas Bloom, and Steven J. Davis. Measuring economic policy uncertainty.

More products