$30
Overview
• Sentiment classification is the automated process of identifying opinions in text and labeling them as positive, negative, or neutral, based on the emotions customers express within them.
• In this assignment, you need to train a recurrent neural network (RNN) or fine-tune a pre-trained language model (e.g., BERT) to predict the sentiment of given tweet.
• You can use pre-trained model.
Dataset
• Twitter US Airline Sentiment from kaggle
• Twitter data was scraped from February of 2015 about each major
U.S. airline
• Contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons.
• This assignment dataset link
• We resample the data and split it into three groups: train, val and test
• Replace sentiment by (positive, 2) (neutral, 1) (negative, 0)
Your task
• Skeleton code: https://colab.research.google.com/drive/1i6bqF82EbMY7dnLYuPWM_o0D cF2ceuLx
• Using word embedding to represent the word
• You can use torch.nn.Embedding to learn word embeddings
• Example: LSTM for part-of-speech tagging
• Or use pre-trained GloVe or fastText word embeddings for better performance
• Example: torchtext, Deep Learning For NLP with PyTorch and Torchtext
• Notice : You need use all text (train, val, test) to get word embeddings
• Using a pre-trained model of your choice, you are to build a deep network that predicts the sentiment of a given tweet.
• PyTorch-transformers pre-trained models
Your task (cont.)
• Output is three sentiment polarity
• Positive: 2
• Neutral: 1
• Negative: 0
• Submission format:
• Follow the index number in test.csv