CS4395 Homework 8 -Solved

Starting from:

$30

1. Read in the csv file using pandas. Convert the author column to categorical data. Display the first
few rows. Display the counts by author.
2. Divide into train and test, with 80% in train. Use random state 1234. Display the shape of train and
test.
3. Process the text by removing stop words and performing tf-idf vectorization, fit to the training data
only, and applied to train and test. Output the training set shape and the test set shape.
4. Try a Bernoulli Naïve Bayes model. What is your accuracy on the test set?
5. The results from step 4 will be disappointing. The classifier just guessed the predominant class,
Hamilton, every time. Looking at the train data shape above, there are 7876 unique words in the
vocabulary. This may be too much, and many of those words may not be helpful. Redo the
vectorization with max_features option set to use only the 1000 most frequent words. In addition to
the words, add bigrams as a feature. Try Naïve Bayes again on the new train/test vectors and
compare your results.
6. Try logistic regression. Adjust at least one parameter in the LogisticRegression() model to see if you
can improve results over having no parameters. What are your results?
7. Try a neural network. Try different topologies until you get good results. What is your final
accuracy?

More products

CptS355- Homework 2 Solved

$30

Add to cart

CptS355- Homework 1 Solved

$30

Add to cart

ComS228- Assignment 5 Solved

$30

Add to cart