Starting from:

$30

CA4023  Assignment 2 -Solved

A baseline sentiment analysis system which uses logistic regression is shown in this 
notebook. This system obtains ~77% accuracy when trained on 80% of the training section 
of the IMDb movie review dataset, and tested on the remaining 20%. 
Your goal is to attempt to improve the accuracy of the baseline model by trying out TWO ideas. 
Your new model should still use Logistic Regression. 
Your experiments should be documented in a Jupyter notebook. 
Marking Criteria 
Marks will be awarded for 
1. Two ideas for improving the results 
2. Implementation of ideas 
3. Description of experiments 
4. Clear, readable code (
PART TWO 
BERT is a neural network language model architecture introduced by Google in 2018 (Devlin 
et al. 2018). When training a BERT model, the network is trained not to predict the next 
token in a sequence but to predict a masked token as in a cloze test, e.g. 
Predict the MASK in this sequence 
It was the best of times, it was the MASK of times 
A secondary training objective is to predict whether two sequences follow on from each 
other, e.g. 
Input Sequence 1 My father’s family name being Pirrip, and my Christian name Philip, my infant 
tongue could make of both names nothing longer or more explicit than Pip. So, I called myself 
Pip, and came to be called Pip. 
Input Sequence 2 I give Pirrip as my father’s family name, on the authority of his tombstone and 
my sister — Mrs. Joe Gargery, who married the blacksmith. As I never saw my father or my 
mother, and never saw any likeness of either of them (for their days were long before the days ofphotographs), my first fancies regarding what they were like, were unreasonably derived from 
their tombstones. 
Output: TRUE 
Sequence 1 My father’s family name being Pirrip, and my Christian name Philip, my infant 
tongue could make of both names nothing longer or more explicit than Pip. So, I called myself 
Pip, and came to be called Pip. 
Sequence 2 It was the best of times, it was the worst of times, it was the age of wisdom, it 
was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was 
the season of Light, it was the season of Darkness, it was the spring of hope, it was the 
winter of despair, we had everything before us, we had nothing before us, we were all going 
direct to Heaven, we were all going direct the other way – in short, the period was so far like 
the present period, that some of its noisiest authorities insisted on its being received, for 
good or for evil, in the superlative degree of comparison only. 
Output: FALSE 
BERT represents tokens as a subword units which can either be full words or subwords, e.g. 
comparison, compar, ##ison. 
BERT has become hugely popular because you can take a pretrained model, add a layer on 
top and fine-tune BERT for any NLP task (question answering, sentiment analysis, named 
entity recognition, and so on). During this fine-tuning process, the labelled data for the task 
is orders of magnitude lower than the amount of data that BERT is pre-trained on. 
Several versions of BERT are available for English. There are versions available for many 
other languages too, plus a multilingual model trained on all languages for which Wikipedia 
is available. 
Here is a Jupyter notebook which shows how BERT can be fine-tuned on a small subset of 
the same movie review dataset that was used in Part Two. It takes roughly 20 mins to train 
on Google Colab with one GPU. To run it using a GPU, select Edit -> Notebook Settings -> 
GPU. 
What you have to do 
The aim of this assignment is for you to gain experience in using BERT. 
Choose one of the following: 
1. Analysis 
Compare the output of the fine-tuned BERT model and your logistic regression model 
from Part Two. Try to answer the following questions 
● What are the strengths and weaknesses of each model? 
● What do both models have trouble with? Can you invent examples that 
“break” both models?● What do both models do well? 
2. Review Text Selection 
A disadvantage of BERT is there is a limit to the number of tokens that can be in the 
input. The number of tokens in the notebook model is restricted to the default value 
of 512. This means that the review text is truncated after 512 tokens. 
Attempt to improve the BERT model by using an intelligent approach to selecting the 
part of the review to include. But note that BERT expects fluent text rather than a bag 
of words. 
3. Fine-tune BERT on a different task/dataset. 
Find a labelled dataset and fine-tune BERT on it. Here are some datasets that are 
available through HuggingFace 
https://github.com/huggingface/datasets/tree/master/datasets 
Caveat: don’t choose a dataset which has already been used as an example in an 
online tutorial, e.g. 
https://huggingface.co/transformers/custom_datasets.html 
4. Something else BERT-related 
Just run the idea by me first! 

More products