Starting from:

$25

COMS4705 -Homework 1 - Code - Solved

1             Edit Model
Let’s begin by creating the backbone of the autocorrect, the edit model. Your changes will occur inside of EditModel.py Given a word, you will be performing a list of 1-distance edits: deletions, insertions, replacements, and transposes. Refer to the noisy channel model slides. This list will be used by the method editProbabilities to distribute probability mass based on the table of edit counts from Wikipedia count1 edit.txt

What You Need To Do
Implement the following methods.

•    insertEdits: all edits formed by inserting a character into the word.

•    transposeEdits: all edits formed by transposing two characters in the word.

•    replaceEdits: all edits formed by replacing a character with another.

We provided deleteEdits which demonstrates how to construct a set of candidate edits (new words) by deleting a character from original word. You will be implementing a similar concept with insertions, transposes, and replacements. Note the use of < when there is no previous character to condition on; for example, a deletion of the first character. Make sure you run the unit tests on your functions before moving on. This can be done by running main in EditModel.py. Make sure your unit tests have 100% overlap.

2             Spelling Correction
Now that we’ve implemented the edit model, we can use it within the spelling corrector SpellCorrect.py. The method correctSentence takes in a sentence with an error, and tests each of these candidate edits from the edit model. This method iterates through all candidate edits for each word in the sentence and attributes a score of the new sentence with the probability of this edit occurring. In SpellCorrect.py main each language model is evaluated. We have provided two simple language models: uniform and unigram. We will improve on these later by creating a smooth unigram, smooth bigram, backo↵ smooth bigram, and a custom model of your choice.

What You Need To Do
Construct the method correctSentence as described above by using the edit model and the instantiated language model and return the most likely “correct” sentence.

3             Extended Language Models
In this section, we will use various language models to get better results.

What You Need To Do
Implement the following language models.

•    Smoothed Unigram Language Model: a unigram model with addone smoothing otherwise known as a Laplace unigram language model.

•    Smoothed Bigram Language Model: a bigram model with add-one smoothing.

•    Backo↵ Language Model: use an unsmoothed bigram model with “backo↵”. The backo↵ function will be a smoothed unigram model.

•    Your choice of a language model: ideas can be interpolated KneserNey, linear interpolated, trigram, or any other model. Train and test with the same data supplied to other models.

Use UnigramModel.py as an example for how to create your train and score functions for each model. Note, treat all unknown words not in your trained model as a single word UNK that appears in the training vocabulary with a frequency of 0. This allows for your trained model to handle words it may not have seen before.

All language models have two functions: train and score.

•    train(Corpus c): This function takes in a data corpus, iterates through each sentence and word and computes the counts of their occurrences or any other statistics necessary for the specific model. Note a datum consists of a word and error pair.

•    score(List sentence): This function takes a list of strings that compose the sentence and returns the probability of the sentence with the candidate edit from your correctSentence function. Rather than traditionally multiplying all probabilities together, it is important to accumulate the sum of their logs (equivalent) due to underflow problems.

Use whatever data from the train method here.

•    CustomModel.py


More products