Starting from:

$25

CSE354 - Assignment 4 - Natural Language Processing - Solved

Data

Colab Computing (Provides Free GPU Access)

Part 1: Music QA with RNNs (40 pts)

Part 2: Music QA with Transformers (60 pts)

Overview 
Goals.

·         Implement an RNN using PyTorch

·         Use a transformer with PyTorch/HuggingFace-transformers

·         Perform a task requiring sequence modeling

·         Fine-tune a transformer model

·         Build a basic QA system using transformers

·         Experience a shared task via Kaggle

General Requirements. This is a Python assignment.  You must use Python version 3.6 or later, along with Pytorch 1.4.0, Gensim-4.0, and transformers-4.5.1.  You may integrate any code from your previous assignments. 

Python Libraries.  No libraries beyond those listed below are permitted.  Of these libraries, you may not use any subcomponents that specifically implement a concept which the instructions indicate you should implement.  The project can be complete without any additional libraries. However, if any additional libraries are deemed permissible they will be listed here:
  Torch-1.4.0

  json

  sys

  re

  numpy as np

  pandas # only for data reading and storage (simpler to not use this)

  csv # though it is just as easy to read the input without this  



Data
Are transformer language models musical experts? In this assignment you will use a dataset of yes/no questions, along with passages of related information. Each example has some relation to music, often in the context of film or other art. The training and trial data are here: 

 
Train Data

Trial Data

The trial data can be used for testing hyper-parameters (such as learning rate, penalty, weight decay, layer sizes). 

Test Data

The test data has been released without its labels. The idea with a shared task is that everyone develops their model on the training and trial data. Then, the test data is used to test the model(s) from the shared task software (Kaggle in this case) and therefore the test set labels are not released to participants. You will submit predictions through the class shared task (on Kaggle, see part 2 checkpoint below) and then the final accuracies and rankings will be announced after the class final exam. Extra points will be awarded to the entries with the highest accuracy on the unseen test data.

Each record has up to 4 pieces:

·         idx:Â a unique id for the record

·         question:Â the question to be answered

·         passage:Â a passage of related information (usually containing the answer to the question

·         label:Â true/false -- the answer for the question true = yes, false = no 
(labels are only provided for train and trial data; no labels are provided for the test data)

The data was originally introduced by the following paper, by Clark et al., 2019: 
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

You are encouraged to skim this paper for insights into approaching the task, but you are not permitted to attempt to get or utilize any other versions of this or any other boolean question answering dataset. You are permitted to use additional datasets for step 2.5 if you choose. 
Colab Computing (Provides Free GPU Access)
To create a Colab iPython notebook go to https://colab.research.google.com/Â and login with your Stony Brook or Computer Science account. From there you can create a new Notebook and place it in your Google Drive. 
You will see a notebook called “Welcome to Colaboratory” which you can use as a tutorial to understand how notebooks work. There is also an official guide on YouTube: Get started with Google Colaboratory (Coding TensorFlow). You may also find this article helpful: Getting Started With Pytorch In Google Collab With Free GPU. 

If you want to have access to GPU or TPU compute go to Runtime → Change Runtime Type Hardware Acceleration → GPU. We recommend you use GPU since the amount of TPU compute time you get at the free-tier is fairly limited. 

You can pull the data directly to your colab:
! wget -nc https://www3.cs.stonybrook.edu/~has/CSE354/music_QA_train.json

! wget -nc  https://www3.cs.stonybrook.edu/~has/CSE354/music_QA_dev.json

kaggle competitions download -c cse-354-assignment-4 [Use the Kaggle API]

When you train your models please try to limit your models to training for a maximum of 30 minutes in order to keep the playing field even for all students.
Part 1: Music QA with RNNs (40 pts)
1.1 Load the data. (5 pts)

·         Use the following code block to load all of the data into a list of dictionaries:

import json

data = []

with open(filename, 'r') as infile:

  Â  data = json.load(infile)
·         See data section for definitions of the dictionary key-values (keys for each entry are 'idx', 'question', 'passage', and for training and trial '[label]').

1.2 Prepare to create word embeddings as input. (5 pts)

·         Since the focus of this assignment is on RNNs, you can start with pre-trained word embeddings. Use the 50 dimensional glove-wiki-gigaword-50Â embeddings available from gensim:

import gensim.downloader as api

word_embs = api.load('glove-wiki-gigaword-50')

print(word_embs.wv['music']) #example how to use

print(word_embs.wv['unk'])
The first time you run the load command, they will be downloaded to your computer (or cloud space) and saved. 
These embeddings are a much better representation of context than those we have made in the previous assignments.

·         Tokenize the passages and questions using gensim.utils tokenize. Like below: 

from gensim.utils import tokenize

…

for record in data:
  ...

  Â record['question_toks'] = tokenize(record['question'], lowercase=True)
1.3 Define and train the GRU-RNN in PyTorch. (20 pts)

·         As input, have your RNN read the token embeddings of the question appended to the end of the passage. For example, assuming get_embed(word)Â returns the word embedding if word exists else returns the embedding for the 'unk'Â token:

input = [get_embed(word) for word in list(record['passage_toks']) + 

  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â list(record['question_toks'])]
·         Define the forward pass for the GRU-based RNN such that:

1.    It takes the sequence of token embeddings as inputs 

2.    Sends those through a GRU layer (use torch.nn.GRU) 

·         input size is the word vector size (50)

·         hidden size can be anything but often using the same size as the word vectors works well

·         number of layers can be 1 or more (suggest starting with getting 1 working). 

1.    Sends the output of the final step of the GRU through a linear layer

2.    Finally, use a softmax to output the labels (0 = false, 1 = true).
Note: the slides on Topic 7: "Applying RNNs to Document and Word labeling" Â "Sequence Labeling - PyTorch Model"Â except only sending the final timesteps output to the linear layer and softmax. 

·         Complete the loss function -- you can use BCELoss()Â which is binary cross-entropy loss

1.    Equivalent to regular cross entropy but more efficient for binary labels. 

·         Run training and then tune your learning-rate, number of epochs [to control runtimes, do not use more than 10 epochs for the RNN], and weight_decay (using L2 penalty) until the model gives high accuracy but does not overfit. 

1.4 Output the accuracy against the trial data. (10 pts)

·         Run the final trained model on trial data to get predicted labels for each example.

·         Find the accuracy of the predicted labels compared to the provided true labels. [Accuracy should be at least 0.717 -- which is as good as the most frequent class, 1]

Hints 

·         Training the GRU RNN with pytorch is just like creating the logistic regression as in assignment 2. You need the same ingredients:

·         define the __init__ and forward methods in a class that inherits from (nnModule)

·         define the loss function (same as assignment 2 or you can use BCELoss to be faster)

·         define and run the training loop

  Â  Â  Â  Â  Â  Â  Â with 2 key differences:

·         each observation in your input (X) is now a sequence of vectors (i.e. word vectors from Glova) rather than a single feature vector. 

·         your forward pass needs a GRU layer and to iterate over the multiple timesteps
Part 2: Music QA with Transformers (60 pts)
2.1 Create baseline transformers: question-only and passage-only (20 pts)

As baselines, you will first create two transformer models: 

·         one that runs on only the question

·         one that runs on only the passage 

You will fine-tune the transformer network, bert-base-uncased to the task. 

·         Limit your input text to either the passage or the question.  

·         Use the AutoTokenizer and AutoModel_ for bert-base-uncased

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

model = \ Â  Â AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
·         Follow either:
 1) Hugging Face Tutorial for Text Classification -- easier
  Â  Â OR 
 2) Hugging Face Tutorial for Question Answering -- more powerful
For preprocessing the training data. You need to adapt this code to:

·         Make sure your data is the specified format for these: which expect idx, label, and sentence (either the passage or question). 

·         Use only the questions or passages 

·         no "[SEP]"

·         Make sure to train two separate models

·         Use BCELoss()Â since it's a binary outcome

·         One rarely needs more than 3 to 5 epochs for fine-tuning

Any code copied directly from the tutorials should be commented with:
# Taken from Hugging Face [SQuAD|Sequence Classification] Tutorial

2.2 Test the baseline transformers (5 pts)

·         Run each of the 2 trained baseline transformer models on the trial data to get predicted labels for each example.

·         Find the accuracy of the predicted labels compared to the provided true labels.  

2.3 Create full question-passage transformer (10 pts)

"[CLS] Question [SEP] Passage [SEP]"

2.4 Test question-passage transformer (5 pts) 

·         Run each of the full question-passage transformer models on the trial data to get predicted labels for each example.

·         Find the accuracy of the predicted labels compared to the provided true labels. 

2.5 Ask your transformer three yes/no questions. (5 pts) 

·         Make up three of your own questions to ask your BoolQ transformer. 

·         Report the questions and  answers in your output text file under a header called 
"Three Questions:" 

2.6 Attempt to improve your transformer and run on "test data". (15 pts)

·         Create your final transformer by adjusting the model described above. 
Some ideas for things to try:

·         different types of pooling from the transformer output (mean, median, min, max)

·         fine-tune other pre-trained transformers instead of BERT

·         Take an off-the-shelf NLI model (natural language inference) and apply it. This could be done either as:

·         Zero shot on BoolQA task. 

·         One can turn Question => Hypothesis and Context => Premise.

·         Fine-tune it on the training data for the BoolQA music-related examples and then fine-tuning for the task itself. 

More products