$30
The goal of this lab session is to learn a named entity recogniser (NER) using the structured perceptron. For each word in a sequence, the named entity recogniser should predict one of the following labels:
• O: not part of a named entity
• PER: part of a person’s name
• LOC: part of a location’s name
• ORG: part of an organisation’s name
• MISC: part of a name of a different type (miscellaneous, e.g. not person, location or organisation)
Load the Data
The training and the test data are available in this folder (use your university google ID to access it): data. It consists of sentences up to 5 words long each from the data used in this shared task on NER.
After downloading the data, you can obtain the word and tag sequences for each sentence using the following function:
def load_dataset_sents(file_path, as_zip=True, to_idx=False, token_vocab=None, target_vocab=None):
targets = [] inputs = [] zip_inps = [] with open(file_path) as f:
for line in f:
sent, tags = line.split('\t')
words = [token_vocab[w.strip()] if to_idx else w.strip() for w in sent.split()] ner_tags = [target_vocab[w.strip()] if to_idx else w.strip() for w in tags.split()] inputs.append(words) targets.append(ner_tags)
zip_inps.append(list(zip(words, ner_tags))) return zip_inps if as_zip else (inputs, targets)
e.g., train_data = load_dataset_sents('train-sents.txt')
Feature Extraction
After having loaded the data, you need to write functions that extract and transform a sentence into differnt feature types (this is the Φ(x,y) in the Perceptron algorithm below) in the form of Python dictionaries (see lecture 4, part I slides) for a given sentence:
1. Current word-current label, Φ1(x,y) (1 mark):
• (1) Write a funtion that takes as input a corpus c = [s1,s2,...,sn] where si = [(x1,y1),...,(x2,y2)] is a sentence consisting of pairs of words x and tags y, and returns a Python dictionary containing current word-current label counts for each pair of words and tags in the corpus, e.g. cw_cl_counts = {‘x1_y1’:103, ‘x2_y2’:12, ...}. Tip: Keep only the features (dict. keys) that have a frequency greater or equal to 3 (you can delete keys in a Python dictionary) if your feature space is too big to keep in memory.
• (2) Write a function phi_1(x, y, cw_cl_counts) that takes as input a sentence and returns a dictionary with counts of the cw_cl_counts keys in the given sentence (x,y).
2. Previous label-current label Φ2(x,y) (1 mark): You need to write two functions following a similar two-step approach as you did for Φ1(x,y) above. After you have the feature extraction functionality in place, you will need to combine Φ2 with Φ1 to see whether the new feature type improves performance. This can be done by merging the two Python dictionaries you obtain for each feature type.
3. Optional Bonus (2 marks) Add at least two more feature types (Φ3(x,y) and Φ4(x,y)) of your choice following the same approach as for Φ1 and combine them with Φ1 and Φ2 to test if you obtain improved performance. Ideas: sub word features, previous/next words, label trigrams, etc..
Tip: For the feature extraction part, you can use the Counter and Python dictionaries to store the feature names as keys with their corresponding value (see the slides of lecture 4.1). Tip++: The feature representation should be obtained by looking only into the training data, otherwise your model is cheating!
Implement the Perceptron
Finally after you have your feature extraction mechanism in place, you need to implement a structured perceptron (2 marks). This has to be done using two functions. One function is called train and takes as input the training data as in:
Input: Dtrain = {(x1,y1)...(xM,yM)} set w = 0
for (x,y) ∈ Dtrain do predict yˆ = argmaxw · Φ(x,y)
y∈YN
if yˆ 6= y then
update w = w + Φ(x,y) − Φ(x,yˆ) return w
The second function should be called predict that takes as input the current weights w, and a feature representation of a sentence (Φ(x,y)) and returns the predicted tag sequence yˆ using the argmax over all the possible tag sequences.
During training of the structured perceptron, you are advised to use multiple passes with randomised order and averaging.
Evaluate the Perceptron with Different Feature Types
Write a test funtion to evaluate the perceptron. The input should be the trained weights (w) of your perceptron and test data. As the dataset is imbalanced (most words are not part of a named entity), the evaluation metric to use is the microaveaged F1 score from scikit learn: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html. You should call it in your code as:
f1_micro = f1_score(correct, predicted, average='micro', labels=['ORG', 'MISC', 'PER', 'LOC'])
where y_true and y_predicted are arrays containing the correct and the predicted labels. These arrays are flatten and contain all the tags for all sentences in the test data (true and predicted respectively).
Tip: You should re-use the predict function to get the prediction for a given test sentence by iterating over all the possible tag sequences for its words.
For each feature set you use (Φ1, Φ1+Φ2 and the bonus features if you opt in), you need to answer the following questions:
• What is the F1 score you obtain for each feature set?
• What are the most positively-weighted features for each feature set? Give the top 10 for each class (tag) and comment on whether they make sense. (If they don’t you might have a bug.)
• Are the differences among the feature sets in micro-F1 score expected? Did the features you propose improve the results? Why?
Correct answers to questions count 2 marks for each feature set you have implemented (+1 for the bonus features).
Tip: You can use barplots (see Python’s matplotlib library) and tables to visualise/summarise the results in your report.