Starting from:

$25

CS4395-Homework 4 Language Models Solved

Use n-gram models for text analysis. 

 In this homework you will create bigram and unigram dictionaries for English, French, and Italian using the provided training data where the key is the unigram or bigram text and the value is the count of that unigram or bigram in the data. Then for the test data, calculate probabilities for each language and compare against the true labels. 

 

Instructions:

 

1.      Program 1: Build language models for 3 languages as follows.       

a.      create a function with a filename as argument

b.      read in the text and remove newlines

c.      tokenize the text

d.      use nltk to create a bigrams list

e.      create a unigrams list

More products